Loading...
Loading...
Browse all stories on DeepNewz
VisitTop AI model on SWE-bench by March 2025?
Llama 3.1 8B • 25%
GPT-4 • 25%
Llama 3.1 70B • 25%
Other • 25%
Published results from SWE-bench evaluations
Meta AI Advances Coding LLMs with RLEF, Llama 3.1 Outperforms GPT-4 on CodeContests
Oct 4, 2024, 01:57 PM
Meta AI has introduced a significant advancement in coding large language models (LLMs) with the development of Reinforcement Learning with Execution Feedback (RLEF). This technique integrates execution feedback at training time to enhance performance at inference time. The approach has been successfully applied to fine-tune Llama 3.1 models, with the 8B model surpassing GPT-4 on DeepMind’s CodeContests and the 70B model achieving state-of-the-art results. Additionally, the method has been validated through extensive evaluations, including on SWE-bench, demonstrating its effectiveness in improving LLMs for code generation tasks. The evaluations were conducted using a cloud-based infrastructure that speeds up evaluations by 30x.
View original story
ChatGPT-4o • 25%
Google's Gemini • 25%
Another AI model • 25%
No clear leader • 25%
Llama 3.1 405B • 25%
GPT-4o • 25%
Claude Sonnet 3.5 • 25%
Other • 25%
Claude 3.5 Sonnet • 33%
GPT-4o • 33%
Google's AI Model • 33%
Meta's Llama 3.1-70B • 25%
OpenAI's GPT-4 • 25%
Google's Bard • 25%
Other • 25%
Claude 3.5 Sonnet • 33%
GPT-4o • 33%
Gemini • 34%
Nemotron 70B • 25%
ChatGPT4o • 25%
Sonnet 3.5 • 25%
Other • 25%
OpenAI o1-preview • 25%
Anthropic Claude 3.5 Sonnet • 25%
OpenAI o1 mini • 25%
Other • 25%
Cosine • 25%
Amazon • 25%
Cognition • 25%
Other • 25%
Llama 3.1 405B • 25%
GPT-4o • 25%
Claude Sonnet 3.5 • 25%
Other • 25%
30.08% to 32% • 25%
32.01% to 34% • 25%
34.01% to 36% • 25%
Above 36% • 25%
Google's Gemini • 25%
OpenAI's GPT • 25%
Microsoft's Azure AI • 25%
Other • 25%
Llama 3-70B • 25%
GPT-4 • 25%
Claude 2.0 • 25%
Other • 25%
Meta AI • 25%
OpenAI • 25%
DeepMind • 25%
Other • 25%