Loading...
Loading...
Browse all stories on DeepNewz
VisitWhich major AI model will surpass Llama 3-70B in performance by mid-2025?
GPT-5 • 25%
BERT-3 • 25%
Claude 3.0 • 25%
Other • 25%
Official AI benchmark results published by recognized institutions
Meta FAIR's Self-Taught Evaluators Boost Llama 3-70B, Surpass GPT-4 in AI Evaluation
Aug 6, 2024, 03:47 PM
Meta, through its FAIR division, has introduced a new AI approach called 'Self-Taught Evaluators' that aims to enhance the evaluation of language models without the need for human annotations. This method utilizes synthetic training data and an iterative self-improvement scheme to train models. The Self-Taught Evaluators have demonstrated superior performance compared to commonly used language model judges like GPT-4, and they match the performance of top reward models trained with labeled examples. The approach involves generating contrasting outputs to train a language model as a judge, producing reasoning traces and final judgments. The model has shown significant improvements, notably boosting the Llama 3-70B model's performance on RewardBench to scores of 88.3 and 88.7 with majority vote, outperforming larger models and human labels.
View original story
OpenAI • 25%
Anthropic • 25%
Google • 25%
Other • 25%
1B • 25%
3B • 25%
11B • 25%
90B • 25%
Yes • 50%
No • 50%
HumanEval • 25%
MMLU_social_sciences • 25%
Both • 25%
Neither • 25%
ChatGPT-4o • 25%
Google's Gemini • 25%
Another AI model • 25%
No clear leader • 25%
Llama 3.1 405B • 25%
GPT-4o • 25%
Claude Sonnet 3.5 • 25%
Other • 25%
Meta • 25%
Google • 25%
OpenAI • 25%
Other • 25%
Claude 3.5 Sonnet • 33%
GPT-4o • 33%
Google's AI Model • 33%
No • 50%
Yes • 50%
New AI application • 25%
New AI model • 25%
Other • 25%
New AI evaluation method • 25%