DeepSeek, a Chinese AI startup, has released DeepSeek-V3, a new ultra-large AI model with 671B parameters that outperforms leading open-source competitors while approaching the capabilities of prominent closed-source models.
Key innovations: DeepSeek-V3 employs a mixture-of-experts architecture that selectively activates only 37B of its 671B parameters for each task, enabling efficient processing while maintaining high performance.
- The model introduces an auxiliary loss-free load-balancing strategy that optimizes expert utilization without compromising performance
- A new multi-token prediction feature allows the model to generate 60 tokens per second, three times faster than previous versions
- The system uses multi-head latent attention (MLA) and DeepSeekMoE architectures for efficient training and inference
Technical specifications: The model underwent extensive training on 14.8T high-quality tokens and features significant context length capabilities.
- DeepSeek-V3’s context length was extended in two stages, first to 32K and then to 128K
- The training process included supervised fine-tuning and reinforcement learning to align with human preferences
- The company implemented various optimizations, including FP8 mixed precision training and the DualPipe algorithm
Cost efficiency: DeepSeek achieved remarkable cost savings in the training process compared to industry standards.
- The entire training process required approximately 2788K H800 GPU hours, costing about $5.57 million
- This represents a significant reduction from typical training costs, such as the estimated $500 million spent on Llama-3.1
Performance benchmarks: DeepSeek-V3 demonstrates superior performance across multiple evaluation metrics.
- The model outperforms open-source competitors like Llama-3.1-405B and Qwen 2.5-72B
- It shows particular strength in Chinese language and mathematical tasks, scoring 90.2 on the Math-500 test
- While matching or exceeding GPT-4o in most areas, it falls behind in specific English-focused tests like SimpleQA and FRAMES
Accessibility and pricing: The model is available through multiple channels with competitive pricing structure.
- The code is accessible via GitHub under an MIT license
- Users can access the model through DeepSeek Chat or via API for commercial applications
- API pricing is set at $0.27/million input tokens and $1.10/million output tokens after February 8
Market implications: The emergence of DeepSeek-V3 signals a significant shift in the competitive landscape between open-source and closed-source AI models, potentially democratizing access to advanced AI capabilities while challenging the dominance of established players in the field.
Recent Stories
DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment
The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...
Oct 17, 2025Tying it all together: Credo’s purple cables power the $4B AI data center boom
Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...
Oct 17, 2025Vatican launches Latin American AI network for human development
The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...