Dagal, Idriss2026-01-312026-01-3120260950-70511872-7409https://doi.org./10.1016/j.knosys.2025.114739https://hdl.handle.net/20.500.12662/10658The learning rate remains one of the most critical yet poorly understood hyperparameters in machine learning optimization, where adaptive methods (e.g., Adam) and momentum-based techniques (e.g., Nesterov acceleration) often suffer from inconsistent convergence, hyperparameter sensitivity, and limited robustness. To address these gaps, we propose AdaMoment, a unified framework integrating adaptive learning rate scaling, Nesterov momentum, and gradient smoothing, featuring (1) dynamic decay mechanisms for stable adaptation, (2) lookahead gradient updates to reduce oscillation, and (3) bias-corrected dual-moment tracking for noise robustness. Experiments across convex and non-convex benchmarks demonstrate AdaMoment's superiority over SGD, Adam, and RMSProp, achieving 27 % faster convergence, 18-24 % lower final loss, 15-20 % reductions in MAE/ RMSE, and strong generalization (R-2 > 0.95). Theoretically, we provide non-convex convergence guarantees, bridging adaptive and momentum-based methods; practically, our framework reduces manual tuning while scaling across architectures. This work advances robust, automated learning rate selection and elucidates the adaptation-momentum interplay in optimization.eninfo:eu-repo/semantics/closedAccessAdaptive optimizationLearning rate schedulingNesterov momentumNon-convex optimizationHybrid optimizer and gradient noise robustnessAdaMoment: A unified adaptive-momentum framework for robust learning rate optimizationArticle10.1016/j.knosys.2025.1147392-s2.0-105022263724Q1332WOS:001627929100001Q1