DeepNewz Markets

Markets Stories

Search

Loading...

Browse all stories on DeepNewz

Scale AI's PlanSearch Boosts Claude 3.5 Code Generation to 77.0%

Sep 8, 2024, 03:06 PM

Scale AI, in collaboration with the California Institute of Technology, Northeastern University, and Cursor AI, has introduced a new state-of-the-art (SOTA) test-time compute method called PlanSearch. This algorithm enhances diversity and efficiency in large language model (LLM) code generation by creating high-level plans in natural language that guide the coding process. The method has shown significant improvements in creativity, diversity in solutions, and the quality of code generated. Notably, Claude 3.5, using PlanSearch, achieved a pass@200 of 77.0% on LiveCodeBench, outperforming the best score achieved without the search algorithm (pass@1 = 41.4%). The optillm lib implements the core idea of PlanSearch, optimizing inference proxy. This development, involving researchers E Wang, F Cassano, C Wu, and Y Bai, represents a significant leap in optimizing inference capabilities and performance of LLMs.

View original story

Markets

Loading...

Looking for markets...

Will another major AI company adopt PlanSearch by the end of 2024?

AI•California Institute of Technology•Northeastern University•Cursor AI•PlanSearch•LiveCodeBench

Resolution / Starting Odds

No • 50%

Yes • 50%

Official announcements from AI companies or credible news sources

Will Claude 3.5 achieve a pass@200 score of 80% or higher on LiveCodeBench by the end of 2024?

AI•California Institute of Technology•Northeastern University•Cursor AI•PlanSearch•LiveCodeBench

Resolution / Starting Odds

No • 50%

Yes • 50%

LiveCodeBench official results or credible AI research publications

Will PlanSearch be integrated into a publicly available open-source LLM library by mid-2024?

AI•California Institute of Technology•Northeastern University•Cursor AI•PlanSearch•LiveCodeBench

Resolution / Starting Odds

Yes • 50%

No • 50%

Open-source repositories like GitHub or official announcements

Which large language model will achieve the highest pass@200 score on LiveCodeBench using PlanSearch by the end of 2024?

AI•California Institute of Technology•Northeastern University•Cursor AI•PlanSearch•LiveCodeBench

Resolution / Starting Odds

Claude 3.5 • 25%

Other • 25%

Bard • 25%

GPT-4 • 25%

LiveCodeBench official results or credible AI research publications

Which major AI company will adopt PlanSearch first by the end of 2024?

AI•California Institute of Technology•Northeastern University•Cursor AI•PlanSearch•LiveCodeBench

Resolution / Starting Odds

Other • 25%

OpenAI • 25%

Google DeepMind • 25%

Microsoft • 25%

Official announcements from AI companies or credible news sources

Which research paper will cite the PlanSearch algorithm the most by the end of 2024?

AI•California Institute of Technology•Northeastern University•Cursor AI•PlanSearch•LiveCodeBench

Resolution / Starting Odds

A paper from Cursor AI • 25%

Other • 25%

A paper from Caltech • 25%

A paper from Northeastern University • 25%

Research paper citation counts from sources like Google Scholar or Semantic Scholar