DeepNewz Markets

Markets Stories

Search

Loading...

Browse all stories on DeepNewz

Market

Which major AI model will surpass Llama 3-70B in performance by mid-2025?

2

Meta•FAIR•GPT•RewardBench

Resolution / Starting Odds

GPT-5 • 25%

BERT-3 • 25%

Claude 3.0 • 25%

Other • 25%

Official AI benchmark results published by recognized institutions

Story

Meta FAIR's Self-Taught Evaluators Boost Llama 3-70B, Surpass GPT-4 in AI Evaluation

Aug 6, 2024, 03:47 PM

Meta, through its FAIR division, has introduced a new AI approach called 'Self-Taught Evaluators' that aims to enhance the evaluation of language models without the need for human annotations. This method utilizes synthetic training data and an iterative self-improvement scheme to train models. The Self-Taught Evaluators have demonstrated superior performance compared to commonly used language model judges like GPT-4, and they match the performance of top reward models trained with labeled examples. The approach involves generating contrasting outputs to train a language model as a judge, producing reasoning traces and final judgments. The model has shown significant improvements, notably boosting the Llama 3-70B model's performance on RewardBench to scores of 88.3 and 88.7 with majority vote, outperforming larger models and human labels.

View original story

Similar markets

Which company will Meta's Llama 3.2 models most significantly outperform in the AI market by mid-2025?

OpenAI • 25%

Anthropic • 25%

Google • 25%

Other • 25%

Which Llama 3.2 model will have the highest MMLU benchmark score by end of Q1 2025?

1B • 25%

3B • 25%

11B • 25%

90B • 25%

Will Llama 3.1 405B remain the largest open-source AI model by end of 2024?

Yes • 50%

No • 50%

Will another company surpass Cerebras' AI performance with Llama 3.1 by June 30, 2025?

Yes • 50%

No • 50%

Will Llama 3.3 surpass Llama 3.1 405B in benchmark performance by mid-2025?

Yes • 50%

No • 50%

Will Llama 3.2 90B surpass GPT-4 in a specific benchmark by end of Q1 2025?

Yes • 50%

No • 50%

Will Llama 3.1 405B outperform GPT-4o in which benchmarks by end of 2024?

HumanEval • 25%

MMLU_social_sciences • 25%

Both • 25%

Neither • 25%

Which AI model will lead in benchmarks by end of 2025?

ChatGPT-4o • 25%

Google's Gemini • 25%

Another AI model • 25%

No clear leader • 25%

Will Llama 3.1 405B surpass GPT-4o in OpenAI benchmarks by end of 2024?

Yes • 50%

No • 50%

Which AI model will be top-performing in benchmarks by end of 2024?

Llama 3.1 405B • 25%

GPT-4o • 25%

Claude Sonnet 3.5 • 25%

Other • 25%

Which company will release a model surpassing Llama 3.3 by mid-2025?

Meta • 25%

Google • 25%

OpenAI • 25%

Other • 25%

Which AI model will have the best performance in public benchmarks by end of 2024?

Claude 3.5 Sonnet • 33%

GPT-4o • 33%

Google's AI Model • 33%

Markets based on same story

Loading...

Looking for markets...

Show all

Will Meta's FAIR division release a new version of the Llama model by end of 2024?

No • 50%

Yes • 50%

Will Meta's Self-Taught Evaluators be adopted by another major tech company by mid-2025?

No • 50%

Yes • 50%

Will the Llama 3-70B model surpass GPT-4 in a major AI benchmark by end of 2024?

Yes • 50%

No • 50%

What will be Meta's next major AI initiative announcement by end of 2024?

New AI application • 25%

New AI model • 25%

Other • 25%

New AI evaluation method • 25%