Loading...
Loading...
Browse all stories on DeepNewz
VisitAI Benchmark with Most Improvement Due to Meta's Iterative RPO by End of 2024
GSM8K • 33%
MATH • 33%
ARC-Challenge • 34%
Published AI research papers or official Meta announcements
Meta Boosts AI Model Accuracy with New Iterative RPO Method
May 1, 2024, 02:17 AM
Meta has recently developed and applied a new method called Iterative Reasoning Preference Optimization (Iterative RPO) to enhance the reasoning capabilities of its AI models, specifically the Llama-2-70B-Chat. This method involves generating chain-of-thought candidates with a large language model, constructing preference pairs based on the correctness of answers, and training the model accordingly. Significant improvements were noted in model accuracy across various benchmarks: GSM8K (from 55.6% to 81.6%), MATH (from 12.5% to 20.8%), and ARC-Challenge (from 77.8% to 86.7%). Additionally, the LLM2Vec approach was applied to the Meta-Llama-3-8B model, enhancing its performance on embedding tasks.
View original story