DeepNewz Markets

Market

Will the Llama 3-70B model surpass GPT-4 in a major AI benchmark by end of 2024?

Meta•FAIR•GPT•RewardBench

Resolution / Starting Odds

Yes • 50%

No • 50%

Official AI benchmark results published by recognized institutions like OpenAI or Meta

Story

Meta FAIR's Self-Taught Evaluators Boost Llama 3-70B, Surpass GPT-4 in AI Evaluation

Aug 6, 2024, 03:47 PM

Meta, through its FAIR division, has introduced a new AI approach called 'Self-Taught Evaluators' that aims to enhance the evaluation of language models without the need for human annotations. This method utilizes synthetic training data and an iterative self-improvement scheme to train models. The Self-Taught Evaluators have demonstrated superior performance compared to commonly used language model judges like GPT-4, and they match the performance of top reward models trained with labeled examples. The approach involves generating contrasting outputs to train a language model as a judge, producing reasoning traces and final judgments. The model has shown significant improvements, notably boosting the Llama 3-70B model's performance on RewardBench to scores of 88.3 and 88.7 with majority vote, outperforming larger models and human labels.

View original story

Similar markets

No similar markets found.