Loading...
Loading...
Browse all stories on DeepNewz
VisitGoogle DeepMind's SCoRe Achieves 15.6% Gain in LLM Self-Correction for MATH
Sep 20, 2024, 05:06 PM
Google DeepMind has developed a multi-turn online reinforcement learning (RL) approach to improve the self-correction capabilities of large language models (LLMs). The new method, named SCoRe, utilizes entirely self-generated data and achieves state-of-the-art performance in self-correction. This approach addresses the limitations of supervised fine-tuning (SFT), which has been found ineffective for self-correction due to a distribution mismatch. The research, titled 'Training Language Models to Self-Correct via Reinforcement Learning,' has gained significant attention, including being highlighted on Hacker News for AI papers. SCoRe achieved a 15.6% gain on self-correction for reasoning problems from MATH and a 9.1% improvement overall.
View original story
Markets
Yes • 50%
No • 50%
Official announcements from major tech companies or credible news sources
No • 50%
Yes • 50%
Published research papers or official announcements from Google DeepMind
No • 50%
Yes • 50%
Official product release notes or credible news sources
Integration into multiple commercial products • 25%
25% gain in MATH • 25%
Other • 25%
15% gain in other domains • 25%
Published research papers or official announcements from Google DeepMind
Other • 25%
Finance • 25%
Education • 25%
Healthcare • 25%
Official announcements from Google DeepMind or credible news sources
Other • 25%
Google DeepMind • 25%
OpenAI • 25%
Microsoft • 25%
Published research papers or official announcements from major research groups