Loading...
Loading...
Browse all stories on DeepNewz
VisitApple and EPFL Introduce AdEMAMix, a Novel AI Optimizer with 1.95x Improvement
Sep 10, 2024, 01:11 AM
Researchers from Apple and EPFL have introduced AdEMAMix, a novel optimization approach leveraging dual exponential moving averages to enhance gradient efficiency and improve large-scale model training performance. The new optimizer, which operates with just 120 lines of code, claims a 1.95x improvement over the widely-used AdamW optimizer. AdEMAMix requires 95% fewer training tokens than AdamW to reach the same level of performance. The approach utilizes two exponential moving averages for the numerator of Adam, a fast one with a low beta and a slow one with a high beta, which could explain its superior performance in various optimization scenarios, including FedOpt variants like DiLoCo.
View original story
Yes • 50%
No • 50%
Google • 25%
Microsoft • 25%
Amazon • 25%
Other • 25%
Yes • 50%
No • 50%
Yes • 50%
No • 50%
No, it will not be featured in any major AI conferences • 25%
Yes, at NeurIPS 2024 • 25%
Yes, at ICML 2024 • 25%
Yes, at both NeurIPS and ICML • 25%
Yes, in Apple's products • 25%
Yes, in other companies' products • 25%
Yes, in both Apple's and other companies' products • 25%
No, it will not be implemented in any commercial products • 25%