Learn Before
Concept
Half-Life of Adadelta Parameter Updates
In the Adadelta optimization algorithm, the decay parameter controls the leaky average of the state variables. Choosing a specific value for relates to the effective half-life of the historical parameter statistics. For instance, setting effectively yields a half-life of 10 for each parameter update, meaning that the influence of past squared gradients and parameter changes diminishes by half over approximately 10 iterations. This default value is found to work quite well in practice for stabilizing the adaptive learning rate.
0
1
Updated 2026-05-16
Tags
D2L
Dive into Deep Learning @ D2L