DeepNewz Markets

Market

Where will 'o3' rank among competitive coders by end of 2025?

OpenAI•O2•ARC•Frontier Math•Codeforces

Resolution / Starting Odds

Top 100 • 25%

Top 200 • 25%

Top 300 • 25%

Below Top 300 • 25%

Official Codeforces ranking updates and OpenAI announcements

Story

OpenAI's 'o3' Models with Breakthrough AI Reasoning Surpass Human Performance on ARC-AGI

Dec 20, 2024, 06:10 PM

OpenAI has announced its latest AI reasoning models, 'o3' and 'o3-mini', marking a significant advancement in artificial intelligence capabilities. The 'o3' model, successor to 'o1', bypasses 'o2' due to potential trademark issues with telecommunications company O2. Designed to enhance thoughtful and contextual responses by 'thinking' before responding via a 'private chain of thought', 'o3' represents a breakthrough in AI reasoning. OpenAI collaborated with ARC to test 'o3' on ARC-AGI, which testers believe marks a qualitative shift in AI capabilities compared to prior limitations of large language models. 'o3' has achieved state-of-the-art performance across several benchmarks, including scoring 87.5% in high-compute mode on the ARC-AGI semi-private evaluation, surpassing human performance estimated at 85%. In low-compute mode, it scored 75.7%. On the Frontier Math benchmark, 'o3' solved 25.2% of the hardest math questions, a substantial increase from the previous best of 2%. Additionally, 'o3' scored 71.7% on SWE-Bench Verified, over 20% better than 'o1', and achieved a Codeforces rating of 2727, equivalent to the 175th best human competitive coder. The models are currently available to a limited group of outside researchers for safety testing, with 'o3-mini' expected to launch publicly by the end of January 2025, followed by 'o3' shortly thereafter.

View original story

Similar markets

4-5 benchmarks • 25%

More than 5 benchmarks • 25%

Which benchmark will o3 improve most by end of 2025?

ARC-AGI • 25%

Frontier Math • 25%

ARC-AGI Semi-Private Evaluation • 25%

SWE-Bench • 25%

GPQA-Diamond benchmark • 25%

AIME • 25%

SWE-Bench Verified test • 25%

How will OpenAI's 'o3' perform on ARC-AGI benchmark compared to others by 2025?

o3 remains the top performer • 25%

o3 ties with another model • 25%

No new models tested • 25%

Another model surpasses o3 • 25%

What ranking will DeepSeek-V3 achieve in a global AI competition by end of 2025?

1st place • 25%

Top 3 • 25%

Top 10 • 25%

Outside Top 10 • 25%

What will be DeepSeek-V3's rank on Aider leaderboard by end of 2025?

1st Place • 25%

2nd Place • 25%

3rd Place • 25%

4th Place or lower • 25%

Where will Grok 3 rank among AI models by end of 2025?

Top 3 • 25%

Outside Top 20 • 25%

Top 20 • 25%

Top 10 • 25%

What will be the accuracy improvement of OpenAI's o3 model over o1-preview by end of 2025?

10% to 20% • 25%

Less than 5% • 25%

More than 20% • 25%

5% to 10% • 25%

Top 1 • 25%

Top 3 • 25%

Market

Story

Similar markets

Will o3-mini achieve 2800+ Codeforces rating by end of 2025?

Will OpenAI's 'o3' model reach a Codeforces rating of 2800 by March 2025?

How many AI benchmarks will OpenAI's 'o3' set records in by the end of 2025?

Which benchmark will o3 improve most by end of 2025?

How will OpenAI's 'o3' perform on ARC-AGI benchmark compared to others by 2025?

What ranking will DeepSeek-V3 achieve in a global AI competition by end of 2025?

What will be DeepSeek-V3's rank on Aider leaderboard by end of 2025?

Where will Grok 3 rank among AI models by end of 2025?

What will be the accuracy improvement of OpenAI's o3 model over o1-preview by end of 2025?

Will OpenAI's o3 score 90%+ on ARC-AGI by end of 2025?

Will OpenAI's 'o3' model exceed 90% on ARC-AGI benchmark by end of 2025?

What position will DeepSeek-V3 achieve in a major AI competition by June 30, 2025?

Will OpenAI publicly launch 'o3-mini' by January 31, 2025?

Will OpenAI's 'o3' model achieve a Codeforces rating of 2800+ by end of 2025?

Will OpenAI's 'o3' model surpass 30% on Frontier Math by the end of 2025?

What will be the ARC-AGI high-compute performance of 'o3' by end of 2025?

Will OpenAI publicly launch 'o3-mini' by January 31, 2025?

Will OpenAI's 'o3' model achieve a Codeforces rating of 2800+ by end of 2025?

Will OpenAI's 'o3' model surpass 30% on Frontier Math by the end of 2025?

What will be the ARC-AGI high-compute performance of 'o3' by end of 2025?