Loading...
Loading...
Browse all stories on DeepNewz
VisitWhich model will top the PlanBench planning benchmark by March 31, 2025?
OpenAI o1 • 33%
Anthropic • 33%
Other • 33%
Next evaluation results published on PlanBench website or official research paper
OpenAI Expands o1 AI Models to Enterprise and Education, Competes with Anthropic, Increases Rate Limits
Sep 24, 2024, 12:35 PM
OpenAI has expanded its o1 AI models to enterprise and education sectors, introducing o1-mini and o1-preview. The o1 model, which includes advanced reasoning capabilities, has been evaluated in a new research paper. The study, conducted by researchers from Arizona State University, indicates that while o1 outperforms other language models on the PlanBench planning benchmark, it still faces challenges in accuracy, efficiency, and reliability. The paper also highlights that domain-independent planners like Fast Downward can solve all instances of Mystery Blocksworld, whereas LLMs struggle. Additionally, OpenAI has increased access and rate limits for developers, allowing up to 1000 requests per minute for o1-preview and 5000 for o1-mini. This expansion comes as OpenAI competes with Anthropic on the enterprise front.
View original story
GPT-4o • 33%
Gemini 1.5 • 33%
Claude 3.5 Sonnet • 34%
GPT-4V • 25%
Claude 3.5 Sonnet • 25%
Flash • 25%
Other • 25%
GPT-4o • 25%
GPT-4o mini • 25%
Other OpenAI models • 25%
Non-OpenAI models • 25%
ContextCite • 25%
LongCite-8B • 25%
LongCite-9B • 25%
GPT-4o • 25%
OpenAI o1-preview • 25%
Anthropic Claude 3.5 Sonnet • 25%
OpenAI o1 mini • 25%
Other • 25%
Meta's Llama 3.1-70B • 25%
OpenAI's GPT-4 • 25%
Google's Bard • 25%
Other • 25%
Cosine • 25%
Amazon • 25%
Cognition • 25%
Other • 25%
Llama 3.1 405B • 25%
GPT-4o • 25%
Claude Sonnet 3.5 • 25%
Other • 25%
Llama 3-70B • 25%
GPT-4 • 25%
Claude 2.0 • 25%
Other • 25%
ARC-AGI • 25%
Frontier Math • 25%
SWE-Bench • 25%
ARC-AGI Semi-Private Evaluation • 25%
SWE-Bench Verified test • 25%
AIME • 25%
GPQA-Diamond benchmark • 25%
ChatGPT-4o • 25%
Google's Gemini • 25%
Another AI model • 25%
No clear leader • 25%
No • 50%
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Yes • 50%
Education • 33%
Enterprise • 33%
Both equally • 33%