1Cademy - Adadelta Update Rule

Learn Before

Adadelta

Formula

Adadelta Update Rule

The Adadelta algorithm updates parameters using a sequence of operations based on leaky averages. Given a decay parameter $\rho$ , the state variable for the gradient's second moment is updated as $\mathbf{s}_t = \rho \mathbf{s}_{t-1} + (1 - \rho) \mathbf{g}_t^2$ . A rescaled gradient $\mathbf{g}_t'$ is then computed using the ratio of the root mean square of previous parameter changes to the root mean square of the gradients: $\mathbf{g}_t' = \frac{\sqrt{\Delta\mathbf{x}_{t-1} + \epsilon}}{\sqrt{{\mathbf{s}_t + \epsilon}}} \odot \mathbf{g}_t$ . The model parameters are updated by subtracting this rescaled gradient: $\mathbf{x}_t = \mathbf{x}_{t-1} - \mathbf{g}_t'$ . Finally, the state variable tracking the parameter changes, initialized at $\Delta \mathbf{x}_0 = 0$ , is updated as $\Delta \mathbf{x}_t = \rho \Delta\mathbf{x}_{t-1} + (1 - \rho) {\mathbf{g}_t'}^2$ , where $\epsilon$ is a small constant (e.g., $10^{-5}$ ) added to maintain numerical stability.

0

1

Updated 2026-05-16

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

Half-Life of Adadelta Parameter Updates

Learn Before

Related

Learn After