Loading...
Loading...
Browse all stories on DeepNewz
VisitWhich AI model will be the best at reasoning tasks on December 31, 2024?
OpenAI o1-preview • 25%
Anthropic Claude 3.5 Sonnet • 25%
OpenAI o1 mini • 25%
Other • 25%
LiveBench AI reasoning task performance metrics
OpenAI o1-Preview Takes First Place Overall on LiveBench AI, Surpasses Claude 3.5 Sonnet
Sep 13, 2024, 02:03 PM
OpenAI's new o1-preview model has claimed the top spot on LiveBench AI, surpassing Anthropic's Claude 3.5 Sonnet, which had led for 85 days. The o1-preview model excels in reasoning, mathematics, data analysis, and language, although it is not perfect in every area. Notably, OpenAI's o1 mini model is reported to be even better at reasoning than the o1-preview. Despite these advancements, Claude 3.5 Sonnet still outperforms the o1-preview in coding tasks. OpenAI o1-preview is now first place overall on LiveBench AI.
View original story
Gemini 2.0 Flash Thinking • 25%
OpenAI's latest model • 25%
Microsoft's latest model • 25%
Other • 25%
OpenAI's O1 model • 25%
GPT-4 • 25%
Gemini • 25%
Anthropic's Claude • 25%
Claude 3.5 Haiku • 25%
Claude 3.5 Sonnet • 25%
GPT-4o-mini • 25%
GPT-4o • 25%
Meta's Llama 3.1-70B • 25%
OpenAI's GPT-4 • 25%
Google's Bard • 25%
Other • 25%
Claude 3.5 Sonnet • 33%
GPT-4o • 33%
Gemini • 34%
Llama 3.1 405B • 25%
GPT-4o • 25%
Claude Sonnet 3.5 • 25%
Other • 25%
Google's Gemini • 25%
OpenAI's GPT • 25%
Microsoft's Azure AI • 25%
Other • 25%
Claude 3.5 Sonnet • 33%
GPT-4o • 33%
Google's AI Model • 33%
GPT-4o • 25%
InternVL 2 • 25%
NVLM 1.0 • 25%
Other • 25%
Imagen 3 • 25%
DALL-E 3 • 25%
Midjourney v6 • 25%
Stable Diffusion 3 • 25%
Grok 3.0 • 25%
OpenAI GPT-5 • 25%
Google DeepMind's latest model • 25%
Other • 25%
Google • 25%
OpenAI • 25%
DeepMind • 25%
Other • 25%
No • 50%
Yes • 50%
Yes • 50%
No • 50%
No • 50%
Yes • 50%
OpenAI o1-preview • 25%
Other • 25%
OpenAI o1 mini • 25%
Anthropic Claude 3.5 Sonnet • 25%