DeepNewz Markets

Markets Stories

Search

Loading...

Browse all stories on DeepNewz

OpenAI o1-Preview Takes First Place Overall on LiveBench AI, Surpasses Claude 3.5 Sonnet

Sep 13, 2024, 02:03 PM

OpenAI's new o1-preview model has claimed the top spot on LiveBench AI, surpassing Anthropic's Claude 3.5 Sonnet, which had led for 85 days. The o1-preview model excels in reasoning, mathematics, data analysis, and language, although it is not perfect in every area. Notably, OpenAI's o1 mini model is reported to be even better at reasoning than the o1-preview. Despite these advancements, Claude 3.5 Sonnet still outperforms the o1-preview in coding tasks. OpenAI o1-preview is now first place overall on LiveBench AI.

View original story

Markets

Loading...

Looking for markets...

Will OpenAI release a new model that surpasses o1-preview in overall performance by end of 2024?

OpenAI•LiveBench AI•Anthropic

Resolution / Starting Odds

No • 50%

Yes • 50%

LiveBench AI rankings and OpenAI announcements

Will OpenAI's o1-preview model maintain the top spot on LiveBench AI by end of 2024?

OpenAI•LiveBench AI•Anthropic

Resolution / Starting Odds

Yes • 50%

No • 50%

LiveBench AI rankings

Will OpenAI's o1-preview model surpass Claude 3.5 Sonnet in coding tasks by end of 2024?

OpenAI•LiveBench AI•Anthropic

Resolution / Starting Odds

No • 50%

Yes • 50%

LiveBench AI coding task performance metrics

Which AI model will be ranked first on LiveBench AI on December 31, 2024?

OpenAI•LiveBench AI•Anthropic

Resolution / Starting Odds

OpenAI o1-preview • 25%

Other • 25%

OpenAI o1 mini • 25%

Anthropic Claude 3.5 Sonnet • 25%

LiveBench AI rankings

Which AI model will be the best at coding tasks on December 31, 2024?

OpenAI•LiveBench AI•Anthropic

Resolution / Starting Odds

Other • 25%

OpenAI o1-preview • 25%

Anthropic Claude 3.5 Sonnet • 25%

OpenAI o1 mini • 25%

LiveBench AI coding task performance metrics

Which AI model will be the best at reasoning tasks on December 31, 2024?

OpenAI•LiveBench AI•Anthropic

Resolution / Starting Odds

OpenAI o1 mini • 25%

Other • 25%

OpenAI o1-preview • 25%

Anthropic Claude 3.5 Sonnet • 25%

LiveBench AI reasoning task performance metrics