Loading...
Loading...
Browse all stories on DeepNewz
VisitPrimary use case for DeepSeek-V3 by end of 2025?
Coding • 25%
Natural Language Processing • 25%
Data Analysis • 25%
Other • 25%
Surveys or reports by AI industry analysts
DeepSeek Unveils 671B DeepSeek-V3 AI Model, Outperforms GPT-4o with 60 Tokens/Sec Speed
Dec 26, 2024, 05:37 PM
DeepSeek has officially released DeepSeek-V3, a new open-source AI language model with 671 billion Mixture-of-Experts (MoE) parameters and 37 billion activated parameters per token. The model reportedly outperforms leading proprietary models such as GPT-4o, Claude 3.5 Sonnet, and Llama 3.1 405b on various benchmarks, including the Aider Polyglot Benchmark, which tests language models on coding exercises across multiple programming languages. DeepSeek-V3 achieves a score of 48% on this benchmark, significantly improving from the 17% score of its predecessor, DeepSeek-V2.5. The model was trained on 14.8 trillion high-quality tokens using 2.788 million H800 GPU hours over less than two months, with a reported training cost of $5.6 million. DeepSeek-V3 also boasts a speed of 60 tokens per second, three times faster than the previous version, and supports a context length of 128,000 tokens. The model utilizes auxiliary-loss-free load balancing and FP8 mixed-precision, and it operates with high sparsity by leveraging 256 experts with only eight activated per token. The release includes fully open-source models and papers. Pricing is set at $0.27 per million input tokens and $1.10 per million output tokens.
View original story
Other • 25%
Natural Language Processing • 25%
Coding Assistance • 25%
Data Analysis • 25%
Customer Service • 25%
Content Creation • 25%
Other • 25%
Research and Development • 25%
Other • 25%
Healthcare • 25%
Finance • 25%
Technology • 25%
Healthcare • 25%
Education • 25%
Technology • 25%
Finance • 25%
Other • 25%
Google • 25%
Microsoft • 25%
Amazon • 25%
0-10% • 25%
11-25% • 25%
26-50% • 25%
51% or more • 25%
Regulatory challenges • 25%
Competitive pressure • 25%
Performance superiority • 25%
Cost efficiency • 25%
MATH-500 • 25%
MMLU • 25%
MMLU-Pro • 25%
Other • 25%
Between 30% and 50% • 25%
Less than 10% • 25%
Between 10% and 30% • 25%
Greater than 50% • 25%
Amazon • 25%
Google • 25%
Other • 25%
Microsoft • 25%