Concept

Half-Life of Adadelta Parameter Updates

In the Adadelta optimization algorithm, the decay parameter ho ho controls the leaky average of the state variables. Choosing a specific value for ho ho relates to the effective half-life of the historical parameter statistics. For instance, setting ho=0.9 ho = 0.9 effectively yields a half-life of 10 for each parameter update, meaning that the influence of past squared gradients and parameter changes diminishes by half over approximately 10 iterations. This default value is found to work quite well in practice for stabilizing the adaptive learning rate.

0

1

Updated 2026-05-16

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L