Loading...
Loading...
Browse all stories on DeepNewz
VisitWill OpenAI's 'o3' model exceed 90% on ARC-AGI benchmark by end of 2025?
Yes • 50%
No • 50%
Published results in AI research journals or OpenAI's official announcements
OpenAI Unveils 'o3' and 'o3-mini' Models, Exceeding Human Performance on ARC-AGI
Dec 20, 2024, 05:11 PM
OpenAI has announced its next-generation reasoning models, 'o3' and 'o3-mini', during the final day of its '12 Days of OpenAI' event. CEO Sam Altman, along with SVP of Research Mark Chen and Research Scientist Hongyu Ren, presented the new models in a live stream. The 'o3' models represent a major advancement over the previous 'o1' model, significantly enhancing AI reasoning capabilities. 'o3' achieved state-of-the-art results on several benchmarks, including scoring 87.5% on the ARC-AGI benchmark—surpassing human-level performance estimated at 85%. The model also scored 25.2% on the Frontier Math benchmark, 96.7% on the AIME, and 71.7% on the SWE-Bench verified test. OpenAI skipped the 'o2' naming due to potential trademark issues with British telecommunications firm O2. The 'o3' models are currently available for safety testing by researchers but have not been publicly released.
View original story
o3 remains the top performer • 25%
Another model surpasses o3 • 25%
o3 ties with another model • 25%
No new models tested • 25%
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Less than 5% • 25%
5% to 10% • 25%
10% to 20% • 25%
More than 20% • 25%
Yes • 50%
No • 50%
0-1 benchmarks • 25%
2-3 benchmarks • 25%
4-5 benchmarks • 25%
More than 5 benchmarks • 25%
Microsoft • 25%
Other • 25%
Amazon • 25%
Google • 25%
Other • 25%
Google DeepMind • 25%
Anthropic • 25%
Meta AI • 25%