DeepNewz Markets

Markets Stories

Search

Loading...

Browse all stories on DeepNewz

Market

MLE-bench updated with more competitions by end of 2025?

3

OpenAI•Kaggle•Kaggle Grandmaster

Resolution / Starting Odds

Yes • 50%

No • 50%

Official announcements from OpenAI or Kaggle

Story

OpenAI Releases MLE-bench with 75 Kaggle Competitions to Evaluate AI Agents' ML Engineering Skills

Oct 10, 2024, 05:33 PM

OpenAI has announced the release of MLE-bench, a new benchmark designed to evaluate the machine learning engineering capabilities of AI agents. The benchmark comprises 75 real-life machine learning engineering competitions sourced from Kaggle. MLE-bench aims to measure how well AI agents perform tasks in machine learning engineering, bridging the gap between theoretical AI knowledge and practical applications in real-world scenarios. The release of this benchmark could accelerate the development of AI agents capable of writing machine learning code, potentially leading to self-improving AI systems. The benchmark raises the prospect of AI agents achieving Kaggle Grandmaster status in the future.

View original story

Similar markets

New LLM retrieval method included in benchmark leaderboard by end of 2024?

Yes • 50%

No • 50%

Will LiveBench AI be integrated into a major AI competition by mid-2025?

Yes • 50%

No • 50%

Will Gemini 1.5 models achieve top 3 positions in MMLU-Pro benchmark by end of 2024?

Yes • 50%

No • 50%

Will AdEMAMix outperform other optimizers in an AI benchmark competition by September 2025?

Yes, it will outperform AdamW • 25%

Yes, it will outperform SGD • 25%

Yes, it will outperform AdaGrad • 25%

No, it will not outperform any • 25%

Will Meta LLM Compiler surpass GPT-4 in a public benchmark by June 30, 2024?

Yes • 50%

No • 50%

Will SpreadsheetLLM achieve an F1 score of 80% or higher by the end of 2024?

Yes • 50%

No • 50%

Which proprietary model will Molmo outperform next in a publicly available benchmark test by June 30, 2025?

GPT-4V • 25%

Claude 3.5 Sonnet • 25%

Flash • 25%

Other • 25%

Will Crunch Lab launch its first major distributed machine learning competition by March 31, 2025?

Yes • 50%

No • 50%

Which benchmark will Llama 3.3 achieve the highest score on by June 30, 2025?

GPQA Diamond (CoT) • 25%

Math (CoT) • 25%

Other AI Benchmark • 25%

None • 25%

Which model will top the PlanBench planning benchmark by March 31, 2025?

OpenAI o1 • 33%

Anthropic • 33%

Other • 33%

Will NEAR AI Research Hub conduct an LLM training competition by the end of 2025?

Yes • 50%

No • 50%

Performance improvement metrics of Meta LLM Compiler by September 30, 2024

Less than 80% • 33%

80% to 90% • 33%

More than 90% • 33%

Markets based on same story

Loading...

Looking for markets...

Show all

AI agent achieves Kaggle Grandmaster status using MLE-bench by end of 2024?

No • 50%

Yes • 50%

MLE-bench adopted as standard benchmark by major AI conference by mid-2025?

No • 50%

Yes • 50%

Average rank of AI agents on MLE-bench by end of 2024?

Below 50% • 25%

Top 50% • 25%

Top 25% • 25%

Top 10% • 25%

First AI company to achieve Kaggle Grandmaster using MLE-bench by end of 2025?

Other • 25%

OpenAI • 25%

Google DeepMind • 25%

Meta AI • 25%