DeepNewz Markets

Market

What will be the performance improvement of Claude 3.5 on LiveCodeBench with PlanSearch by December 31, 2024?

PlanSearch•LiveCodeBench

Resolution / Starting Odds

Less than 75% • 25%

75% to 79.9% • 25%

80% to 84.9% • 25%

85% or more • 25%

LiveCodeBench official results or Scale AI's official reports

Story

Scale AI's PlanSearch Enhances Claude 3.5 LLM Code Generation with SOTA Method

Sep 9, 2024, 01:16 PM

Scale AI has proposed a new method called PlanSearch to enhance the diversity and efficiency of large language model (LLM) code generation. This novel search algorithm significantly improves the performance of Claude 3.5, achieving a pass@200 of 77.0% on LiveCodeBench, compared to a pass@1 of 41.4% without search. The method aims to address the challenges of scaling inference capabilities for optimal performance in LLMs. PlanSearch is a state-of-the-art (SOTA) test-time compute method. This development is part of broader efforts to improve LLMs, which are reshaping interactions with technology through applications such as AI-powered chatbots and complex language understanding tasks.

View original story

Similar markets

Top 10 • 25%

Outside Top 10 • 25%

Which company will surpass Claude 3.5 Haiku in coding benchmarks by mid-2025?

OpenAI • 25%

Google • 25%

Microsoft • 25%

Other • 25%

Bard • 25%

Other • 25%

Market

Story

Similar markets

Will Claude 3.5 achieve a pass@200 score of 80% or higher on LiveCodeBench by the end of 2024?

Will Claude 3.5 Sonnet achieve 55% on the SWE-bench by March 31, 2025?

Will Claude 3.5 Sonnet score 55%+ on SWE-bench by end of 2024?

Where will Claude 3.5 rank in AI model performance by May 31, 2025?

Which company will surpass Claude 3.5 Haiku in coding benchmarks by mid-2025?

Will Claude 3.5 Sonnet surpass GPT-4o in public benchmark by March 2025?

Will Claude 3.5 Sonnet rank first in at least one major AI benchmark by end of 2024?

Claude 3.5 maintains SEAL leaderboard top position by end of 2024?

Will Claude 3.5 Sonnet be integrated into a major commercial application by the end of 2024?

Claude 3.5 Sonnet beats GPT-4o in all major AI benchmarks by end of 2024?

Will Claude 3.5 Sonnet be used in an app with over 1 million downloads by June 30, 2024?

Which large language model will achieve the highest pass@200 score on LiveCodeBench using PlanSearch by the end of 2024?

Will Claude 3.5 achieve a pass@200 of 80% or more on LiveCodeBench by June 30, 2025?

Will PlanSearch be integrated into another major LLM by March 31, 2025?

Will Scale AI release another significant update to PlanSearch by December 31, 2024?

What will be the next significant feature or improvement added to PlanSearch by June 30, 2025?

Will Claude 3.5 achieve a pass@200 of 80% or more on LiveCodeBench by June 30, 2025?

Will PlanSearch be integrated into another major LLM by March 31, 2025?

Will Scale AI release another significant update to PlanSearch by December 31, 2024?

What will be the next significant feature or improvement added to PlanSearch by June 30, 2025?