Loading...
Loading...
Browse all stories on DeepNewz
VisitWhich LLM will be the top performer on the GSM8K benchmark by the end of 2024?
Llama8B • 25%
GPT-5 • 25%
Claude • 25%
Gemini • 25%
Official GSM8K benchmark results publication
Tiny LLM Llama8B Outperforms GPT-4 with 96.7% on GSM8K Math Benchmark, 200x Fewer Parameters
Jun 16, 2024, 11:20 PM
Recent advancements in Large Language Models (LLMs) have shown significant improvements in mathematical reasoning tasks. Tiny LLMs, such as Llama8B, achieved a 96.7% score on the GSM8K math benchmark, surpassing GPT-4, Claude, and Gemini, despite having 200 times fewer parameters. This success is attributed to techniques such as Monte Carlo Tree Search (MCTS) and backpropagation, similar to those used by Google to solve Go. Additionally, vLLM now supports FP8 quantization, optimizing performance and efficiency. Open-source LLMs like Qwen 2 and Nemotron are rapidly advancing, with fine-tunes expected to match top models like Gemini and GPT-4 turbo. Llama-3 70b was replaced in weeks.
View original story
Phi • 20%
Mistral • 20%
GPT-4 • 20%
Claude • 20%
Gemini • 20%
GPT-4 • 25%
PaLM 2 • 25%
Claude • 25%
LLaMA • 25%
Qwen2 • 25%
Llama 3 • 25%
GLM 4 • 25%
Other • 25%
Google leads • 33%
Competitor A leads • 33%
Competitor B leads • 33%
Model X • 33%
Model Y • 33%
Model Z • 33%
LLaMA 3 70B • 33%
DeepSeek 33B • 33%
A new entrant • 34%
Llama8B-related • 25%
Other • 25%
Gemini-related • 25%
GPT-5-related • 25%