Loading...
Loading...
Browse all stories on DeepNewz
VisitAverage rank of AI agents on MLE-bench by end of 2024?
Top 10% • 25%
Top 25% • 25%
Top 50% • 25%
Below 50% • 25%
Kaggle leaderboard
OpenAI Releases MLE-bench with 75 Kaggle Competitions to Evaluate AI Agents' ML Engineering Skills
Oct 10, 2024, 05:33 PM
OpenAI has announced the release of MLE-bench, a new benchmark designed to evaluate the machine learning engineering capabilities of AI agents. The benchmark comprises 75 real-life machine learning engineering competitions sourced from Kaggle. MLE-bench aims to measure how well AI agents perform tasks in machine learning engineering, bridging the gap between theoretical AI knowledge and practical applications in real-world scenarios. The release of this benchmark could accelerate the development of AI agents capable of writing machine learning code, potentially leading to self-improving AI systems. The benchmark raises the prospect of AI agents achieving Kaggle Grandmaster status in the future.
View original story
Top-1 • 25%
Top-3 • 25%
Top-5 • 25%
Not in Top-5 • 25%
Top 1 • 25%
Top 3 • 25%
Top 5 • 25%
Below Top 5 • 25%
Rank 1 • 25%
Rank 2 • 25%
Rank 3 • 25%
Rank 4 or lower • 25%
Top 1 • 25%
Top 5 • 25%
Top 10 • 25%
Below Top 10 • 25%
Top 1 • 25%
Top 2 to 3 • 25%
Top 4 to 5 • 25%
Below top 5 • 25%
Top 3 • 25%
Top 5 • 25%
Top 10 • 25%
Outside Top 10 • 25%
OpenAI o1-preview • 25%
Anthropic Claude 3.5 Sonnet • 25%
OpenAI o1 mini • 25%
Other • 25%
Llama 3-70B • 25%
GPT-4 • 25%
Claude 2.0 • 25%
Other • 25%
Nemotron 70B • 25%
ChatGPT4o • 25%
Sonnet 3.5 • 25%
Other • 25%
Claude 3.5 Haiku • 25%
Claude 3.5 Sonnet • 25%
GPT-4o-mini • 25%
GPT-4o • 25%
Llama 3.1 405B • 25%
GPT-4o • 25%
Claude Sonnet 3.5 • 25%
Other • 25%
Other • 25%
OpenAI • 25%
Google DeepMind • 25%
Meta AI • 25%