AdaMoment: A unified adaptive-momentum framework for robust learning rate optimization

Küçük Resim Yok

Tarih

2026

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Elsevier

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

The learning rate remains one of the most critical yet poorly understood hyperparameters in machine learning optimization, where adaptive methods (e.g., Adam) and momentum-based techniques (e.g., Nesterov acceleration) often suffer from inconsistent convergence, hyperparameter sensitivity, and limited robustness. To address these gaps, we propose AdaMoment, a unified framework integrating adaptive learning rate scaling, Nesterov momentum, and gradient smoothing, featuring (1) dynamic decay mechanisms for stable adaptation, (2) lookahead gradient updates to reduce oscillation, and (3) bias-corrected dual-moment tracking for noise robustness. Experiments across convex and non-convex benchmarks demonstrate AdaMoment's superiority over SGD, Adam, and RMSProp, achieving 27 % faster convergence, 18-24 % lower final loss, 15-20 % reductions in MAE/ RMSE, and strong generalization (R-2 > 0.95). Theoretically, we provide non-convex convergence guarantees, bridging adaptive and momentum-based methods; practically, our framework reduces manual tuning while scaling across architectures. This work advances robust, automated learning rate selection and elucidates the adaptation-momentum interplay in optimization.

Açıklama

Anahtar Kelimeler

Adaptive optimization, Learning rate scheduling, Nesterov momentum, Non-convex optimization, Hybrid optimizer and gradient noise robustness

Kaynak

Knowledge-Based Systems

WoS Q Değeri

Q1

Scopus Q Değeri

Q1

Cilt

332

Sayı

Künye