DeepNewz Markets

Market

Llama 3.1 8B model outperforms GPT-4 on CodeContests by end of 2024?

DeepMind•CodeContests

Resolution / Starting Odds

Yes • 50%

No • 50%

Results from an independent evaluation on CodeContests published by credible AI research organizations or journals

Story

Meta AI Advances Coding LLMs with RLEF, Llama 3.1 Outperforms GPT-4 on CodeContests

Oct 4, 2024, 01:57 PM

Meta AI has introduced a significant advancement in coding large language models (LLMs) with the development of Reinforcement Learning with Execution Feedback (RLEF). This technique integrates execution feedback at training time to enhance performance at inference time. The approach has been successfully applied to fine-tune Llama 3.1 models, with the 8B model surpassing GPT-4 on DeepMind’s CodeContests and the 70B model achieving state-of-the-art results. Additionally, the method has been validated through extensive evaluations, including on SWE-bench, demonstrating its effectiveness in improving LLMs for code generation tasks. The evaluations were conducted using a cloud-based infrastructure that speeds up evaluations by 30x.

View original story

Similar markets

Will Llama 3.1 405B outperform GPT-4o in which benchmarks by end of 2024?

HumanEval • 25%

MMLU_social_sciences • 25%

Both • 25%

Neither • 25%

Equal to GPT-4o • 25%

No significant usage • 25%

Will Llama 3.1 405B surpass GPT-4o in user adoption by end of 2024?

Yes • 50%

No • 50%

Market

Story

Similar markets

Will Llama 3.1 405B outperform GPT-4o in which benchmarks by end of 2024?

Llama 3.1 405B surpasses GPT-4o in MMLU_social_sciences benchmarks by end of 2024?

Llama 3.1 405B surpasses GPT-4o in HumanEval benchmarks by end of 2024?

Will Llama 3.1 405B surpass GPT-4o in OpenAI benchmarks by end of 2024?

Will Llama 3.2 90B surpass GPT-4 in a specific benchmark by end of Q1 2025?

Will the Llama 3-70B model surpass GPT-4 in a major AI benchmark by end of 2024?

Will Meta's Llama 3.1 405B surpass GPT-4o in benchmarks by end of 2024?

Will Meta's Llama 3.1 achieve higher benchmark performance than GPT-4o by the end of 2024?

Will Meta's Llama 3.2 models surpass OpenAI's GPT-4 in benchmark performance by the end of 2024?

Will Meta's Llama 3 model outperform GPT-4o in GLUE benchmark by October 31, 2024?

Llama 3.1 405B's usage in academic research compared to GPT-4o by end of 2024?

Will Llama 3.1 405B surpass GPT-4o in user adoption by end of 2024?

Meta AI releases new Llama with improved coding by March 2025?

RLEF adopted by 3 major AI labs for coding LLMs by mid-2025?

First company to publish on RLEF advancements by mid-2025?

Top AI model on CodeContests by end of 2024?

Meta AI releases new Llama with improved coding by March 2025?

RLEF adopted by 3 major AI labs for coding LLMs by mid-2025?

First company to publish on RLEF advancements by mid-2025?

Top AI model on CodeContests by end of 2024?