Positron, a private AI chip startup, has raised $51.6 million in Series A funding to challenge Nvidia’s dominance in AI inference chips with its Atlas hardware platform. The company claims its specialized inference accelerators deliver 2x to 5x better performance per watt and dollar compared to Nvidia’s solutions, targeting enterprises seeking more efficient AI deployment without requiring liquid cooling or extreme power densities.
What you should know: Positron’s Atlas chip is already shipping and in production just 15 months after the company’s founding, with confirmed deployments at major enterprises including Cloudflare, a security and cloud content networking provider, and across sectors like networking, gaming, and content delivery networks.
• The system supports up to 0.5 trillion-parameter models in a single 2kW server and is compatible with Hugging Face transformer models via an OpenAI API-compatible endpoint.
• Atlas achieves 93% memory bandwidth utilization compared to the typical 10-30% range seen in GPUs, while delivering 66% lower power usage than Nvidia’s H100.
The big picture: The AI inference market represents a massive opportunity as companies shift from model training to deployment, but it’s also increasingly volatile—rival startup Groq recently slashed its 2025 revenue projections from $2 billion to $500 million.
• “We build chips that can be deployed in hundreds of existing data centers because they don’t require liquid cooling or extreme power densities,” said Mitesh Agrawal, Positron’s CEO and former Lambda COO.
Why this matters: Modern AI workloads have fundamentally shifted from compute-intensive tasks to memory-bound transformer architectures, creating an opportunity for specialized hardware that prioritizes memory capacity and bandwidth over raw computational power.
• While Nvidia continues focusing on compute scaling, Positron is betting on memory-first design as transformer inference requires a near 1:1 ratio of compute to memory operations.
In plain English: Think of it like this: older AI models were like powerful calculators that needed lots of computing muscle, but today’s AI models are more like vast libraries that need quick access to enormous amounts of information stored in memory rather than heavy number-crunching.
What’s coming next: Positron’s next-generation Titan platform, launching in 2026, will support models up to 16 trillion parameters with up to two terabytes of high-speed memory per accelerator.
• Built on custom “Asimov” silicon, Titan is designed to operate with standard air cooling in conventional data centers, avoiding the liquid-cooled configurations that next-generation GPUs increasingly require.
• The platform targets multi-trillion parameter models like the presumed GPT-5, which are considered necessary steps toward artificial general intelligence.
Engineering approach: Positron designed its system as a drop-in replacement for existing infrastructure, allowing customers to use Nvidia-trained model binaries without code rewrites.
• “If a customer had to change their behavior or their actions in any way, shape or form, that was a barrier,” said Thomas Sohmers, Positron co-founder and CTO.
• “CUDA mode isn’t something to fight. It’s an ecosystem to participate in,” Agrawal explained.
Funding and production: The oversubscribed Series A round was led by Valor Equity Partners, Atreides Management and DFJ Growth, with support from Flume Ventures, Resilience Reserve, 1517 Fund and Unless.
• Positron’s first-generation chips were fabricated in the U.S. using Intel facilities, with final server assembly also based domestically for geopolitical resilience and supply chain stability.
• For the Asimov chip, fabrication will shift to TSMC while keeping as much of the production chain in the U.S. as possible.
What they’re saying: Industry veterans emphasize the importance of focusing on hardware economics rather than bundling with proprietary services.
• “If you can’t convince a customer to deploy your hardware based on its economics, you’re not going to be profitable,” Agrawal said.
• “A key differentiator is our ability to run frontier AI models with better efficiency—achieving 2x to 5x performance per watt and dollar compared to Nvidia,” Sohmers explained.