DeepNewz Markets

Markets Stories

Search

Loading...

Browse all stories on DeepNewz

Market

Which model will top the PlanBench planning benchmark by March 31, 2025?

2

OpenAI•Arizona State University•PlanBench•Fast Downward•Mystery Blocksworld•Anthropic

Resolution / Starting Odds

OpenAI o1 • 33%

Anthropic • 33%

Other • 33%

Next evaluation results published on PlanBench website or official research paper

Story

OpenAI Expands o1 AI Models to Enterprise and Education, Competes with Anthropic, Increases Rate Limits

Sep 24, 2024, 12:35 PM

OpenAI has expanded its o1 AI models to enterprise and education sectors, introducing o1-mini and o1-preview. The o1 model, which includes advanced reasoning capabilities, has been evaluated in a new research paper. The study, conducted by researchers from Arizona State University, indicates that while o1 outperforms other language models on the PlanBench planning benchmark, it still faces challenges in accuracy, efficiency, and reliability. The paper also highlights that domain-independent planners like Fast Downward can solve all instances of Mystery Blocksworld, whereas LLMs struggle. Additionally, OpenAI has increased access and rate limits for developers, allowing up to 1000 requests per minute for o1-preview and 5000 for o1-mini. This expansion comes as OpenAI competes with Anthropic on the enterprise front.

View original story

Similar markets

Which model will have the best performance in benchmarks by end of 2024?

GPT-4o • 33%

Gemini 1.5 • 33%

Claude 3.5 Sonnet • 34%

Which proprietary model will Molmo outperform next in a publicly available benchmark test by June 30, 2025?

GPT-4V • 25%

Claude 3.5 Sonnet • 25%

Flash • 25%

Other • 25%

Which model will be the most fine-tuned by October 31, 2024?

GPT-4o • 25%

GPT-4o mini • 25%

Other OpenAI models • 25%

Non-OpenAI models • 25%

Which model will have the highest citation F1 score in a major independent benchmark by end of 2024?

ContextCite • 25%

LongCite-8B • 25%

LongCite-9B • 25%

GPT-4o • 25%

Top AI model on SWE-bench by March 2025?

Llama 3.1 8B • 25%

GPT-4 • 25%

Llama 3.1 70B • 25%

Other • 25%

Which AI model will be ranked first on LiveBench AI on December 31, 2024?

OpenAI o1-preview • 25%

Anthropic Claude 3.5 Sonnet • 25%

OpenAI o1 mini • 25%

Other • 25%

Which AI model will achieve the highest performance benchmark by December 31, 2024?

Meta's Llama 3.1-70B • 25%

OpenAI's GPT-4 • 25%

Google's Bard • 25%

Other • 25%

Which company will achieve the highest score on SWE-Bench in 2024?

Cosine • 25%

Amazon • 25%

Cognition • 25%

Other • 25%

Which AI model will be top-performing in benchmarks by end of 2024?

Llama 3.1 405B • 25%

GPT-4o • 25%

Claude Sonnet 3.5 • 25%

Other • 25%

Which AI model will be the top performer in RewardBench by end of 2024?

Llama 3-70B • 25%

GPT-4 • 25%

Claude 2.0 • 25%

Other • 25%

Which benchmark will o3 improve most by end of 2025?

ARC-AGI • 25%

Frontier Math • 25%

SWE-Bench • 25%

ARC-AGI Semi-Private Evaluation • 25%

SWE-Bench Verified test • 25%

AIME • 25%

GPQA-Diamond benchmark • 25%

Which AI model will lead in benchmarks by end of 2025?

ChatGPT-4o • 25%

Google's Gemini • 25%

Another AI model • 25%

No clear leader • 25%

Markets based on same story

Loading...

Looking for markets...

Show all

Will OpenAI announce a new version of the o1 model with improved accuracy and efficiency by June 30, 2025?

No • 50%

Yes • 50%

Will OpenAI secure a major partnership with a Fortune 500 company for its o1 AI models by December 31, 2024?

No • 50%

Yes • 50%

Will OpenAI's o1 model outperform Anthropic's model on PlanBench by March 31, 2025?

No • 50%

Yes • 50%

Which sector will primarily adopt OpenAI's o1 AI models by December 31, 2024?

Education • 33%

Enterprise • 33%

Both equally • 33%