DeepNewz Markets

Markets Stories

Search

Loading...

Browse all stories on DeepNewz

Market

How many AI benchmarks will OpenAI's 'o3' set records in by the end of 2025?

2

OpenAI•Frontier Math•American Invitational Mathematics Examination•AIME•Codeforces•O2

Resolution / Starting Odds

0-1 benchmarks • 25%

2-3 benchmarks • 25%

4-5 benchmarks • 25%

More than 5 benchmarks • 25%

AI research publications and benchmark result announcements

Story

OpenAI's 'o3' Surpasses Human Performance; 'o3-mini' Launching January 2025

Dec 20, 2024, 06:42 PM

OpenAI has announced 'o3' and 'o3-mini', their next-generation reasoning models that significantly surpass previous AI models in benchmarks. The 'o3' model achieved breakthrough performance on the ARC-AGI benchmark, scoring 75.7% in low-compute mode and an impressive 87.5% in high-compute mode, exceeding the human performance threshold of 85%. It also set new records on other benchmarks, including solving 25.2% of Frontier Math problems (surpassing the previous best of 2%), scoring 96.7% on the American Invitational Mathematics Examination (AIME), and achieving 71.7% on SWE-Bench verified. The model achieved a Codeforces rating of 2727, placing it in the top 0.05% of competitive programmers. OpenAI's 'o3' models are designed to 'think' before responding via a 'private chain of thought,' representing a significant leap in AI's ability to adapt to novel tasks and marking a qualitative shift in AI capabilities. The company skipped 'o2' due to potential trademark issues with telecommunications firm O2. The 'o3-mini' model is planned to be released publicly by the end of January 2025, with the full 'o3' model to follow shortly after.

View original story

Similar markets

Will OpenAI's 'o3' model surpass 30% on Frontier Math by the end of 2025?

Yes • 50%

No • 50%

Will OpenAI's 'o3' model exceed 90% on ARC-AGI benchmark by end of 2025?

Yes • 50%

No • 50%

How many AI models will OpenAI's o1 model train from scratch by December 31, 2024?

None • 25%

1 to 2 • 25%

3 to 4 • 25%

5 or more • 25%

What will be the accuracy improvement of OpenAI's o3 model over o1-preview by end of 2025?

Less than 5% • 25%

5% to 10% • 25%

10% to 20% • 25%

More than 20% • 25%

Will OpenAI's 'o3' model achieve a Codeforces rating of 2800+ by end of 2025?

Yes • 50%

No • 50%

What will be OpenAI o1 model's performance on benchmark tasks by end of 2024?

Top 10% • 25%

Top 1% • 25%

Top 5% • 25%

Below Top 10% • 25%

Will OpenAI's o1 model achieve a new benchmark performance in AI research by January 31, 2025?

Yes • 50%

No • 50%

Will OpenAI's o3 score 90%+ on ARC-AGI by end of 2025?

Yes • 50%

No • 50%

Which feature of OpenAI's O1 model will set a new benchmark in AI performance by June 30, 2025?

Reinforcement learning • 25%

Search-based reasoning • 25%

Thinking before answering • 25%

Other • 25%

$Will OpenAI's 'o1' model surpass GPT-4o in a public benchmark by end of 2024?$

Will OpenAI's 'o1' model surpass GPT-4o in a public benchmark by end of 2024?

Yes • 50%

No • 50%

What will be the IQ score of the next version of OpenAI's o1 model by June 30, 2025?

Below 120 • 25%

120-130 • 25%

130-140 • 25%

Above 140 • 25%

In which area will OpenAI's o1 model show the next major performance improvement by March 31, 2025?

Improvement in medical reasoning • 25%

Improvement in coding tasks • 25%

Improvement in scientific reasoning • 25%

Other • 25%

Markets based on same story

Loading...

Looking for markets...

Show all

Will a Fortune 500 company adopt OpenAI's 'o3' model for a major project by June 2025?

Yes • 50%

No • 50%

Will OpenAI release 'o3-mini' publicly by January 31, 2025?

No • 50%

Yes • 50%

Will OpenAI's 'o3' model reach a Codeforces rating of 2800 by March 2025?

No • 50%

Yes • 50%

How will OpenAI's 'o3' perform on ARC-AGI benchmark compared to others by 2025?

No new models tested • 25%

o3 remains the top performer • 25%

Another model surpasses o3 • 25%

o3 ties with another model • 25%