DeepNewz Markets

DeepSeek-V3, Backed by High-Flyer, Outperforms GPT-4o and Claude in Open-Source AI

Jan 2, 2025, 01:03 PM

DeepSeek-V3, an open-source AI model developed by the Chinese AI firm DeepSeek and backed by the quantitative hedge fund High-Flyer, has emerged as a major competitor to leading AI models from OpenAI and Meta. The model, which features 671 billion parameters with 37 billion activated for specific tasks, was trained on 14.8 trillion high-quality tokens using a budget of $6 million and 2,048 Nvidia H800 GPUs over two months. This cost-efficient approach contrasts sharply with the $100 million spent on training GPT-4o. DeepSeek-V3 employs a Mixture-of-Experts (MoE) architecture and innovative techniques like Multi-Head Latent Attention (MLA) and auxiliary-loss-free load balancing to enhance efficiency and scalability. The model excels in benchmarks for coding, mathematics, and long-context understanding, outperforming GPT-4o and Claude 3.5 Sonnet in several areas, including coding benchmarks. It also demonstrates exceptional strength in Chinese language tasks. Its open-source nature allows unrestricted access for developers and researchers, positioning it as a disruptive force in the AI landscape. Despite US export controls on advanced AI chips, DeepSeek leveraged domestically available Nvidia H800 chips to achieve its results. The model's release underscores the growing competitiveness of open-source AI and raises questions about the safety and implications of releasing powerful AI tools to the public.

View original story

Markets