1Cademy - AdaDelta (Deep Learning Optimization Algorithm)

Learn Before

Deep Learning Optimizer Algorithms

AdaDelta (Deep Learning Optimization Algorithm)

Similar to RMSProp, AdaDelta (Adaptive Delta) is a proposed method to compensate for the shortcomings of AdaGrad. In the same way as RMSProp, AdaDelta calculates the exponential mean instead of the sum when calculating the gradient sum of squares(often denoted G). Instead of simply using the step size as η, the exponential mean value is used with the square of the change value of the step size.

$G = \gamma G + (1-\gamma)(\nabla_{\theta}J(\theta_t))^2$ $\Delta_{\theta} = \frac{\sqrt{s+\epsilon}}{\sqrt{G + \epsilon}} \cdot \nabla_{\theta}J(\theta_t)$ $\theta = \theta - \Delta_{\theta}$ $s = \gamma s + (1-\gamma) \Delta_{\theta}^2$