Loading...
Loading...
Browse all stories on DeepNewz
VisitOpenAI's 'o3' Models with Breakthrough AI Reasoning Surpass Human Performance on ARC-AGI
Dec 20, 2024, 06:10 PM
OpenAI has announced its latest AI reasoning models, 'o3' and 'o3-mini', marking a significant advancement in artificial intelligence capabilities. The 'o3' model, successor to 'o1', bypasses 'o2' due to potential trademark issues with telecommunications company O2. Designed to enhance thoughtful and contextual responses by 'thinking' before responding via a 'private chain of thought', 'o3' represents a breakthrough in AI reasoning. OpenAI collaborated with ARC to test 'o3' on ARC-AGI, which testers believe marks a qualitative shift in AI capabilities compared to prior limitations of large language models. 'o3' has achieved state-of-the-art performance across several benchmarks, including scoring 87.5% in high-compute mode on the ARC-AGI semi-private evaluation, surpassing human performance estimated at 85%. In low-compute mode, it scored 75.7%. On the Frontier Math benchmark, 'o3' solved 25.2% of the hardest math questions, a substantial increase from the previous best of 2%. Additionally, 'o3' scored 71.7% on SWE-Bench Verified, over 20% better than 'o1', and achieved a Codeforces rating of 2727, equivalent to the 175th best human competitive coder. The models are currently available to a limited group of outside researchers for safety testing, with 'o3-mini' expected to launch publicly by the end of January 2025, followed by 'o3' shortly thereafter.
View original story
Markets
Yes • 50%
No • 50%
Official announcement by OpenAI on their website or press release
No • 50%
Yes • 50%
Official Codeforces rating updates and OpenAI announcements
No • 50%
Yes • 50%
Published results on OpenAI's official channels or peer-reviewed publications
Below 85% • 25%
Above 95% • 25%
90% to 95% • 25%
85% to 90% • 25%
Published results on OpenAI's official channels or peer-reviewed publications
Above 80% • 25%
Below 70% • 25%
70% to 75% • 25%
75% to 80% • 25%
Published results on OpenAI's official channels or peer-reviewed publications
Top 100 • 25%
Top 200 • 25%
Top 300 • 25%
Below Top 300 • 25%
Official Codeforces ranking updates and OpenAI announcements