DeepNewz Markets

Markets Stories

Search

Loading...

Browse all stories on DeepNewz

Market

Which LLM will be the top performer on the GSM8K benchmark by the end of 2024?

5

Large Language Models•Llama•Claude•Gemini•Monte Carlo Tree Search•Google•Go•Nemotron

Resolution / Starting Odds

Llama8B • 25%

GPT-5 • 25%

Claude • 25%

Gemini • 25%

Official GSM8K benchmark results publication

Story

$Tiny LLM Llama8B Outperforms GPT-4 with 96.7% on GSM8K Math Benchmark, 200x Fewer Parameters$

Tiny LLM Llama8B Outperforms GPT-4 with 96.7% on GSM8K Math Benchmark, 200x Fewer Parameters

Jun 16, 2024, 11:20 PM

Recent advancements in Large Language Models (LLMs) have shown significant improvements in mathematical reasoning tasks. Tiny LLMs, such as Llama8B, achieved a 96.7% score on the GSM8K math benchmark, surpassing GPT-4, Claude, and Gemini, despite having 200 times fewer parameters. This success is attributed to techniques such as Monte Carlo Tree Search (MCTS) and backpropagation, similar to those used by Google to solve Go. Additionally, vLLM now supports FP8 quantization, optimizing performance and efficiency. Open-source LLMs like Qwen 2 and Nemotron are rapidly advancing, with fine-tunes expected to match top models like Gemini and GPT-4 turbo. Llama-3 70b was replaced in weeks.

View original story

Similar markets

Best Performing LLM on GSM1k by End of 2024

Phi • 20%

Mistral • 20%

GPT-4 • 20%

Claude • 20%

Gemini • 20%

Most Improved LLM on GSM1k by End of 2024

Phi • 20%

Mistral • 20%

GPT-4 • 20%

Claude • 20%

Gemini • 20%

Which LLM will be ranked highest on Scale AI SEAL Leaderboards by end of 2024?

GPT-4 • 25%

PaLM 2 • 25%

Claude • 25%

LLaMA • 25%

Meta AI GSM8K Benchmark Accuracy Above 90% by End of 2024?

Yes • 50%

No • 50%

Llama 3 exceeds GPT in 2024 benchmarks?

Yes • 50%

No • 50%

Llama3-8B surpasses Llama3-70B on reasoning benchmark by end of 2024?

Yes • 50%

No • 50%

Top multilingual model on Open LLM Leaderboard by end of 2024?

Qwen2 • 25%

Llama 3 • 25%

GLM 4 • 25%

Other • 25%

Which AI chip leads in performance by end of 2024?

Google leads • 33%

Competitor A leads • 33%

Competitor B leads • 33%

$Which LLM will be in the top 3 for Math domain on SEAL Leaderboards as of November 30, 2024?$

Which LLM will be in the top 3 for Math domain on SEAL Leaderboards as of November 30, 2024?

Model X • 33%

Model Y • 33%

Model Z • 33%

Llama 3 surpasses GPT-4 in benchmarks by end of 2024?

Yes • 50%

No • 50%

Llama 3 Outperforms GPT-4 in Benchmarks by April 2025?

Yes • 50%

No • 50%

Who will be the main competitor to Codestral-22B by end of 2024?

LLaMA 3 70B • 33%

DeepSeek 33B • 33%

A new entrant • 34%

Markets based on same story

Loading...

Looking for markets...

Show all

$Will GPT-5 outperform Llama8B on the GSM8K benchmark by the end of 2024?$

Will GPT-5 outperform Llama8B on the GSM8K benchmark by the end of 2024?

Yes • 50%

No • 50%

$Will Llama8B maintain or exceed a 96.7% score on the next GSM8K benchmark?$

Will Llama8B maintain or exceed a 96.7% score on the next GSM8K benchmark?

Yes • 50%

No • 50%

$Will top tech companies adopt Llama8B by the end of 2024?$

Will top tech companies adopt Llama8B by the end of 2024?

No • 50%

Yes • 50%

$Which AI model will be the subject of the top AI research paper in 2024?$

Which AI model will be the subject of the top AI research paper in 2024?

Llama8B-related • 25%

Other • 25%

Gemini-related • 25%

GPT-5-related • 25%