Loading...
Loading...
Browse all stories on DeepNewz
VisitWhich large language model will achieve the highest pass@200 score on LiveCodeBench using PlanSearch by the end of 2024?
Claude 3.5 • 25%
GPT-4 • 25%
Bard • 25%
Other • 25%
LiveCodeBench official results or credible AI research publications
Scale AI's PlanSearch Boosts Claude 3.5 Code Generation to 77.0%
Sep 8, 2024, 03:06 PM
Scale AI, in collaboration with the California Institute of Technology, Northeastern University, and Cursor AI, has introduced a new state-of-the-art (SOTA) test-time compute method called PlanSearch. This algorithm enhances diversity and efficiency in large language model (LLM) code generation by creating high-level plans in natural language that guide the coding process. The method has shown significant improvements in creativity, diversity in solutions, and the quality of code generated. Notably, Claude 3.5, using PlanSearch, achieved a pass@200 of 77.0% on LiveCodeBench, outperforming the best score achieved without the search algorithm (pass@1 = 41.4%). The optillm lib implements the core idea of PlanSearch, optimizing inference proxy. This development, involving researchers E Wang, F Cassano, C Wu, and Y Bai, represents a significant leap in optimizing inference capabilities and performance of LLMs.
View original story
aiOla's Whisper-Medusa • 25%
OpenAI's Whisper • 25%
Google's Speech-to-Text • 25%
Other • 25%
GPT-4o • 25%
InternVL 2 • 25%
NVLM 1.0 • 25%
Other • 25%
OpenAI o1-preview • 25%
Anthropic Claude 3.5 Sonnet • 25%
OpenAI o1 mini • 25%
Other • 25%
Yes • 50%
No • 50%
Apple's 7B AI model • 25%
Mistral 7B • 25%
Llama 3 8B • 25%
Google's Gemma • 25%
Anthropic's Claude 3.5 Sonnet • 25%
OpenAI's GPT-4o • 25%
A new model • 25%
Other existing model • 25%
Meta's Llama 3.1-70B • 25%
OpenAI's GPT-4 • 25%
Google's Bard • 25%
Other • 25%
1B parameter model • 25%
3B parameter model • 25%
40B parameter model • 25%
None of the models achieve the highest score • 25%
OpenAI • 25%
Google • 25%
Microsoft • 25%
Other • 25%
Llama 3.1 405B • 25%
GPT-4o • 25%
Claude Sonnet 3.5 • 25%
Other • 25%
DeepSeek-R1-Lite-Preview • 25%
OpenAI's o1-preview • 25%
Google DeepMind's model • 25%
Other • 25%
No • 50%
Yes • 50%
Yes • 50%
No • 50%
Other • 25%
OpenAI • 25%
Google DeepMind • 25%
Microsoft • 25%