DeepNewz Markets

Market

Will GPT-5 outperform Llama8B on the GSM8K benchmark by the end of 2024?

Large Language Models•Llama•Claude•Gemini•Monte Carlo Tree Search•Google•Go•Nemotron

Resolution / Starting Odds

Yes • 50%

No • 50%

Official GSM8K benchmark results publication

Story

$Tiny LLM Llama8B Outperforms GPT-4 with 96.7% on GSM8K Math Benchmark, 200x Fewer Parameters$

Tiny LLM Llama8B Outperforms GPT-4 with 96.7% on GSM8K Math Benchmark, 200x Fewer Parameters

Jun 16, 2024, 11:20 PM

Recent advancements in Large Language Models (LLMs) have shown significant improvements in mathematical reasoning tasks. Tiny LLMs, such as Llama8B, achieved a 96.7% score on the GSM8K math benchmark, surpassing GPT-4, Claude, and Gemini, despite having 200 times fewer parameters. This success is attributed to techniques such as Monte Carlo Tree Search (MCTS) and backpropagation, similar to those used by Google to solve Go. Additionally, vLLM now supports FP8 quantization, optimizing performance and efficiency. Open-source LLMs like Qwen 2 and Nemotron are rapidly advancing, with fine-tunes expected to match top models like Gemini and GPT-4 turbo. Llama-3 70b was replaced in weeks.

View original story

Similar markets

No similar markets found.