Learn Before
Concept

Adam optimization algorithm

Adam stands for: adaptive moment estimation. Briefly, this method combines momentum and RMSprop (root mean squared prop). Like momentum alone, RMSprop smooths the gradient, (it takes RMSProp and applies momentum to the rescaled gradients). This alternative approach is best explained mathematically:

Adam introduces four hyperparameters:

  • learning rate alpha
  • beta from momentum (usually 0.9)
  • beta2 from RMSprop (usually 0.999)
  • epsilon (usually 1e-8)

As mentioned above, you usually do not need to tune beta, beta2, and epsilon as the values listed above will generally work well. Only the learning rate is left to tune in order to accelerate training.

Adam combines the advantages of AdaGrad and RMSProp these two optimization algorithms. It comprehensively considers the first moment estimation of the gradient (First Moment Estimation, the mean value of the gradient) and the second moment estimation (Second Moment Estimation, the uncentered variance of the gradient), and calculates the update step size.

0

1

Updated 2021-10-30

Tags

Data Science