Loading...
Loading...
Browse all stories on DeepNewz
VisitWill Google DeepMind's SCoRe approach achieve a 20% gain in self-correction for the MATH dataset by December 31, 2024?
Yes • 50%
No • 50%
Peer-reviewed publications or official Google DeepMind reports
Google DeepMind's SCoRe Achieves 15.6% Gain in Self-Correction for Language Models
Sep 20, 2024, 01:26 PM
Google DeepMind has developed a new multi-turn chain of thought online reinforcement learning (RL) approach called SCoRe to improve the self-correction capabilities of large language models (LLMs). This method uses entirely self-generated data and has achieved state-of-the-art performance in self-correction. The approach has shown a 15.6% gain in self-correction for reasoning problems from the MATH dataset and a 9.1% improvement in other areas. The research, authored by A Kumar, V Zhuang, R Agarwal, and Y Su, suggests that training with off-the-shelf datasets for RL is less effective compared to using on-policy data from the model being fine-tuned.
View original story
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Education • 25%
Healthcare • 25%
Finance • 25%
Other • 25%
Yes • 50%
No • 50%
25% gain in MATH • 25%
15% gain in other domains • 25%
Integration into multiple commercial products • 25%
Other • 25%
Less than 90% • 25%
90%-94.99% • 25%
95%-99.99% • 25%
100% • 25%
No • 50%
Yes • 50%
25% gain in self-correction for MATH dataset • 25%
Other • 25%
Adoption by three major tech companies • 25%
15% gain in self-correction for other datasets • 25%
Other • 25%
MATH dataset • 25%
Natural Language Processing • 25%
Computer Vision • 25%