Loading...
Loading...
Browse all stories on DeepNewz
VisitApple and EPFL Introduce AdEMAMix, a Novel AI Optimizer with 1.95x Improvement
Sep 10, 2024, 01:11 AM
Researchers from Apple and EPFL have introduced AdEMAMix, a novel optimization approach leveraging dual exponential moving averages to enhance gradient efficiency and improve large-scale model training performance. The new optimizer, which operates with just 120 lines of code, claims a 1.95x improvement over the widely-used AdamW optimizer. AdEMAMix requires 95% fewer training tokens than AdamW to reach the same level of performance. The approach utilizes two exponential moving averages for the numerator of Adam, a fast one with a low beta and a slow one with a high beta, which could explain its superior performance in various optimization scenarios, including FedOpt variants like DiLoCo.
View original story
Markets
No • 50%
Yes • 50%
Public announcements or press releases from major tech companies
Yes • 50%
No • 50%
Google Scholar or other academic citation databases
Yes • 50%
No • 50%
Official repositories of popular open-source AI frameworks like TensorFlow, PyTorch, etc.
No, it will not be featured in any major AI conferences • 25%
Yes, at NeurIPS 2024 • 25%
Yes, at ICML 2024 • 25%
Yes, at both NeurIPS and ICML • 25%
Conference agendas and presentations from major AI conferences like NeurIPS, ICML, etc.
Yes, in Apple's products • 25%
Yes, in other companies' products • 25%
Yes, in both Apple's and other companies' products • 25%
No, it will not be implemented in any commercial products • 25%
Public announcements or product documentation from companies
Yes, it will outperform SGD • 25%
No, it will not outperform any • 25%
Yes, it will outperform AdamW • 25%
Yes, it will outperform AdaGrad • 25%
Results from AI benchmark competitions like NeurIPS, ICML, or other AI conferences