Learn Before
Formula
Adadelta Update Rule
The Adadelta algorithm updates parameters using a sequence of operations based on leaky averages. Given a decay parameter , the state variable for the gradient's second moment is updated as . A rescaled gradient is then computed using the ratio of the root mean square of previous parameter changes to the root mean square of the gradients: . The model parameters are updated by subtracting this rescaled gradient: . Finally, the state variable tracking the parameter changes, initialized at , is updated as , where is a small constant (e.g., ) added to maintain numerical stability.
0
1
Updated 2026-05-16
Tags
D2L
Dive into Deep Learning @ D2L