DeepNewz Markets

Market

Top AI model on SWE-bench by March 2025?

DeepMind•CodeContests

Resolution / Starting Odds

Llama 3.1 8B • 25%

GPT-4 • 25%

Llama 3.1 70B • 25%

Other • 25%

Published results from SWE-bench evaluations

Story

Meta AI Advances Coding LLMs with RLEF, Llama 3.1 Outperforms GPT-4 on CodeContests

Oct 4, 2024, 01:57 PM

Meta AI has introduced a significant advancement in coding large language models (LLMs) with the development of Reinforcement Learning with Execution Feedback (RLEF). This technique integrates execution feedback at training time to enhance performance at inference time. The approach has been successfully applied to fine-tune Llama 3.1 models, with the 8B model surpassing GPT-4 on DeepMind’s CodeContests and the 70B model achieving state-of-the-art results. Additionally, the method has been validated through extensive evaluations, including on SWE-bench, demonstrating its effectiveness in improving LLMs for code generation tasks. The evaluations were conducted using a cloud-based infrastructure that speeds up evaluations by 30x.

View original story

Similar markets

Which AI model will lead in benchmarks by end of 2025?

ChatGPT-4o • 25%

Google's Gemini • 25%

Another AI model • 25%

No clear leader • 25%

Which AI model will be top-performing in benchmarks by end of 2024?

Llama 3.1 405B • 25%

GPT-4o • 25%

Claude Sonnet 3.5 • 25%

Other • 25%

Which AI model will have the best performance in public benchmarks by end of 2024?

Claude 3.5 Sonnet • 33%

GPT-4o • 33%

Google's AI Model • 33%

Which AI model will achieve the highest performance benchmark by December 31, 2024?

Meta's Llama 3.1-70B • 25%

OpenAI's GPT-4 • 25%

Google's Bard • 25%

Other • 25%

Which AI model will be the best performing in 2024 benchmarks?

Claude 3.5 Sonnet • 33%

GPT-4o • 33%

Gemini • 34%

Which AI model will achieve the highest score in the next Arena Hard benchmark test by Q1 2025?

Nemotron 70B • 25%

ChatGPT4o • 25%

Sonnet 3.5 • 25%

Other • 25%

Which AI model will be ranked first on LiveBench AI on December 31, 2024?

OpenAI o1-preview • 25%

Anthropic Claude 3.5 Sonnet • 25%

OpenAI o1 mini • 25%

Other • 25%

Which company will achieve the highest score on SWE-Bench in 2024?

Cosine • 25%

Amazon • 25%

Cognition • 25%

Other • 25%

Which AI model will be the top performer in MLPerf benchmark by end of 2024?

Llama 3.1 405B • 25%

GPT-4o • 25%

Claude Sonnet 3.5 • 25%

Other • 25%

What will be Genie AI's SWE-Bench score by the end of 2024?

30.08% to 32% • 25%

32.01% to 34% • 25%

34.01% to 36% • 25%

Above 36% • 25%

Which AI model will be rated highest in performance by June 30, 2025?

Google's Gemini • 25%

OpenAI's GPT • 25%

Microsoft's Azure AI • 25%

Other • 25%

Which AI model will be the top performer in RewardBench by end of 2024?

Llama 3-70B • 25%

GPT-4 • 25%

Claude 2.0 • 25%

Other • 25%

Market

Story

Similar markets

Which AI model will lead in benchmarks by end of 2025?

Which AI model will be top-performing in benchmarks by end of 2024?

Which AI model will have the best performance in public benchmarks by end of 2024?

Which AI model will achieve the highest performance benchmark by December 31, 2024?

Which AI model will be the best performing in 2024 benchmarks?

Which AI model will achieve the highest score in the next Arena Hard benchmark test by Q1 2025?

Which AI model will be ranked first on LiveBench AI on December 31, 2024?

Which company will achieve the highest score on SWE-Bench in 2024?

Which AI model will be the top performer in MLPerf benchmark by end of 2024?

What will be Genie AI's SWE-Bench score by the end of 2024?

Which AI model will be rated highest in performance by June 30, 2025?

Which AI model will be the top performer in RewardBench by end of 2024?

Llama 3.1 8B model outperforms GPT-4 on CodeContests by end of 2024?

Meta AI releases new Llama with improved coding by March 2025?

RLEF adopted by 3 major AI labs for coding LLMs by mid-2025?

First company to publish on RLEF advancements by mid-2025?

Llama 3.1 8B model outperforms GPT-4 on CodeContests by end of 2024?

Meta AI releases new Llama with improved coding by March 2025?

RLEF adopted by 3 major AI labs for coding LLMs by mid-2025?

First company to publish on RLEF advancements by mid-2025?