DeepNewz Markets

Market

Will o3-mini achieve 2800+ Codeforces rating by end of 2025?

OpenAI•Codeforces•American Invitational Mathematics Examination•AIME•EpochAI•Frontier Math•François Chollet•British•O2

Resolution / Starting Odds

Yes • 50%

No • 50%

Codeforces rating results published by OpenAI or Codeforces platform

Story

OpenAI Unveils o3 and o3-mini, Surpasses Human-Level ARC-AGI Performance, Sets New AI Benchmarks

Dec 20, 2024, 06:23 PM

OpenAI has unveiled 'o3' and 'o3-mini', its next-generation reasoning models designed to enhance AI's ability to adapt to novel tasks using a 'private chain of thought' and self fact-checking features. Announced on December 20, 2024, o3 surpasses previous models in various benchmarks, achieving a score of 87.5% on the ARC-AGI Semi-Private Evaluation in high-compute mode, surpassing the human performance threshold of 85%, with high-compute tasks costing $3,500 each. In low-compute mode, o3 scored 75.7% on the same evaluation. The model also set new records on other technical benchmarks, including 71.7% on the SWE-Bench Verified test, a 2727 rating on Codeforces, 96.7% on the American Invitational Mathematics Examination (AIME), and 96.7% on the GPQA-Diamond benchmark. It achieved 25.2% on EpochAI's Frontier Math problems, a significant jump from the previous best of 2%. François Chollet, a prominent AI researcher, stated that o3 represents "not merely incremental improvement, but a genuine breakthrough" in AI's ability to adapt to novel tasks. OpenAI plans to release o3-mini to the public by the end of January 2025, with o3 following shortly after. The development of o3 skips over 'o2' due to potential trademark issues with British telecommunications firm O2.

View original story

Similar markets

Top 300 • 25%

Below Top 300 • 25%

75% to 80% • 25%

Above 80% • 25%

Market

Story

Similar markets

Will OpenAI's 'o3' model achieve a Codeforces rating of 2800+ by end of 2025?

Will OpenAI's 'o3' model reach a Codeforces rating of 2800 by March 2025?

Where will 'o3' rank among competitive coders by end of 2025?

Will Gukesh D achieve a FIDE rating of 2850 or higher by the end of 2025?

Will D Gukesh achieve FIDE rating of 2850+ by end of 2025?

Will D Gukesh achieve a FIDE rating of 2800 by the end of 2025?

What will be the SWE-Bench Verified score of 'o3' by end of 2025?

Will Gukesh cross 2900 Elo rating by the end of 2025?

D. Gukesh to surpass 2850 Elo rating by end of 2025?

Will FLUX.1[pro] achieve an ELO score higher than 95 by the end of 2024?

Will Gukesh D reach a 2900 Elo rating by the end of 2025?

Will Gemini 1.5 Pro surpass a 1400 ELO rating by the end of 2024?

OpenAI o3 public release by February 28, 2025?

Will OpenAI's o3 score 90%+ on ARC-AGI by end of 2025?

First company to integrate OpenAI's o3 by end of 2025?

Public reception of o3-mini by March 2025?

OpenAI o3 public release by February 28, 2025?

Will OpenAI's o3 score 90%+ on ARC-AGI by end of 2025?

First company to integrate OpenAI's o3 by end of 2025?

Public reception of o3-mini by March 2025?