DeepNewz Markets

Markets Stories

Search

Loading...

Browse all stories on DeepNewz

Market

MLE-bench adopted as standard benchmark by major AI conference by mid-2025?

2

OpenAI•Kaggle•Kaggle Grandmaster

Resolution / Starting Odds

Yes • 50%

No • 50%

Announcements from major AI conferences such as NeurIPS, ICML, or CVPR

Story

OpenAI Releases MLE-bench with 75 Kaggle Competitions to Evaluate AI Agents' ML Engineering Skills

Oct 10, 2024, 05:33 PM

OpenAI has announced the release of MLE-bench, a new benchmark designed to evaluate the machine learning engineering capabilities of AI agents. The benchmark comprises 75 real-life machine learning engineering competitions sourced from Kaggle. MLE-bench aims to measure how well AI agents perform tasks in machine learning engineering, bridging the gap between theoretical AI knowledge and practical applications in real-world scenarios. The release of this benchmark could accelerate the development of AI agents capable of writing machine learning code, potentially leading to self-improving AI systems. The benchmark raises the prospect of AI agents achieving Kaggle Grandmaster status in the future.

View original story

Similar markets

Will LiveBench AI be integrated into a major AI competition by mid-2025?

Yes • 50%

No • 50%

At which major AI conference will LiveBench AI be featured first by end of 2024?

NeurIPS • 25%

ICML • 25%

AAAI • 25%

CVPR • 25%

Top AI model on SWE-bench by March 2025?

Llama 3.1 8B • 25%

GPT-4 • 25%

Llama 3.1 70B • 25%

Other • 25%

Which sector will be the first to publicly adopt LiveBench AI for LLM evaluation by end of 2024?

Tech industry • 25%

Healthcare • 25%

Finance • 25%

Education • 25%

Will the machine learning community adopt a 'Scientist Turing Test' by end of 2024?

Yes • 50%

No • 50%

Which AI model will lead in benchmarks by end of 2025?

ChatGPT-4o • 25%

Google's Gemini • 25%

Another AI model • 25%

No clear leader • 25%

Will AdEMAMix be featured in major AI conferences by end of 2024?

Yes, at NeurIPS 2024 • 25%

Yes, at ICML 2024 • 25%

Yes, at both NeurIPS and ICML • 25%

No, it will not be featured in any major AI conferences • 25%

Which AI model will achieve highest score in MMLU Social Sciences benchmark by end of 2024?

Llama 3.1 405B • 25%

GPT-4o • 25%

Claude Sonnet 3.5 • 25%

Other • 25%

Will AdEMAMix outperform other optimizers in an AI benchmark competition by September 2025?

Yes, it will outperform AdamW • 25%

Yes, it will outperform SGD • 25%

Yes, it will outperform AdaGrad • 25%

No, it will not outperform any • 25%

Which benchmark will LiquidAI's models achieve SOTA performance in by June 30, 2024?

MMLU • 25%

ARC • 25%

GSM8K • 25%

None by June 30, 2024 • 25%

Will LiveBench AI receive an industry award for innovation by end of 2024?

Yes • 50%

No • 50%

RLEF adopted by 3 major AI labs for coding LLMs by mid-2025?

Yes • 50%

No • 50%

Markets based on same story

Loading...

Looking for markets...

Show all

AI agent achieves Kaggle Grandmaster status using MLE-bench by end of 2024?

No • 50%

Yes • 50%

MLE-bench updated with more competitions by end of 2025?

No • 50%

Yes • 50%

Average rank of AI agents on MLE-bench by end of 2024?

Below 50% • 25%

Top 50% • 25%

Top 25% • 25%

Top 10% • 25%

First AI company to achieve Kaggle Grandmaster using MLE-bench by end of 2025?

Other • 25%

OpenAI • 25%

Google DeepMind • 25%

Meta AI • 25%