DeepNewz Markets

Market

Will PlanSearch be integrated into a publicly available open-source LLM library by mid-2024?

AI•California Institute of Technology•Northeastern University•Cursor AI•PlanSearch•LiveCodeBench

Resolution / Starting Odds

Yes • 50%

No • 50%

Open-source repositories like GitHub or official announcements

Story

Scale AI's PlanSearch Boosts Claude 3.5 Code Generation to 77.0%

Sep 8, 2024, 03:06 PM

Scale AI, in collaboration with the California Institute of Technology, Northeastern University, and Cursor AI, has introduced a new state-of-the-art (SOTA) test-time compute method called PlanSearch. This algorithm enhances diversity and efficiency in large language model (LLM) code generation by creating high-level plans in natural language that guide the coding process. The method has shown significant improvements in creativity, diversity in solutions, and the quality of code generated. Notably, Claude 3.5, using PlanSearch, achieved a pass@200 of 77.0% on LiveCodeBench, outperforming the best score achieved without the search algorithm (pass@1 = 41.4%). The optillm lib implements the core idea of PlanSearch, optimizing inference proxy. This development, involving researchers E Wang, F Cassano, C Wu, and Y Bai, represents a significant leap in optimizing inference capabilities and performance of LLMs.

View original story

Similar markets

Will PlanSearch be integrated into another major LLM by March 31, 2025?

No • 50%

Yes • 50%

Which major LLM provider will adopt PlanSearch next by June 30, 2025?

OpenAI • 25%

Google • 25%

Microsoft • 25%

Other • 25%

Will Meta's Llama 3.1 405B be integrated into a major open-source project by end of 2024?

Yes • 50%

No • 50%

What will be the next significant feature or improvement added to PlanSearch by June 30, 2025?

Speed Optimization • 25%

Accuracy Improvement • 25%

Resource Efficiency • 25%

Other • 25%

Not adopted • 25%

Other • 25%

Will Meta release a new version of LLM Compiler by March 31, 2024?

Yes • 50%

No • 50%

Will LexicMap be integrated into major genomic databases or tools by the end of 2024?

GenBank • 25%

RefSeq • 25%

Both GenBank and RefSeq • 25%

Neither • 25%

Will Llama 3.1 405B be integrated into more than 10 major platforms by end of 2024?

Yes • 50%

No • 50%

Market

Story

Similar markets

Will PlanSearch be integrated into another major LLM by March 31, 2025?

Which major LLM provider will adopt PlanSearch next by June 30, 2025?

Will Meta's Llama 3.1 405B be integrated into a major open-source project by end of 2024?

What will be the next significant feature or improvement added to PlanSearch by June 30, 2025?

Will Braintrust launch a new LLM-focused product feature by mid-2025?

Will Scale AI release another significant update to PlanSearch by December 31, 2024?

New LLM retrieval method included in benchmark leaderboard by end of 2024?

Will SpreadsheetLLM be integrated into Microsoft Excel by the end of 2024?

How will the new method to enhance the security of open-source LLMs be adopted by end of 2024?

Will Meta release a new version of LLM Compiler by March 31, 2024?

Will LexicMap be integrated into major genomic databases or tools by the end of 2024?

Will Llama 3.1 405B be integrated into more than 10 major platforms by end of 2024?

Will another major AI company adopt PlanSearch by the end of 2024?

Will Claude 3.5 achieve a pass@200 score of 80% or higher on LiveCodeBench by the end of 2024?

Which large language model will achieve the highest pass@200 score on LiveCodeBench using PlanSearch by the end of 2024?

Which major AI company will adopt PlanSearch first by the end of 2024?

Will another major AI company adopt PlanSearch by the end of 2024?

Will Claude 3.5 achieve a pass@200 score of 80% or higher on LiveCodeBench by the end of 2024?

Which large language model will achieve the highest pass@200 score on LiveCodeBench using PlanSearch by the end of 2024?

Which major AI company will adopt PlanSearch first by the end of 2024?