Learn Before
  • Exponentially Weighted Average

  • Deep Learning Optimizer Algorithms

  • (Batch) Gradient Descent (Deep Learning Optimization Algorithm)

Gradient Descent with Momentum

The basic idea of gradient descent with momentum is to compute an exponentially weighted average of your gradients, and then use that gradient to update your weights instead. It almost always works faster than the standard gradient descent algorithm.

0

1

4 years ago

Tags

Data Science

Related
  • An Example of Exponentially Weighted Average

  • Gradient Descent with Momentum

  • Bias Correction

  • RMSprop (Deep Learning Optimization Algorithm)

  • Mini-Batch Gradient Descent

  • Gradient Descent with Momentum

  • An overview of gradient descent optimization algorithms

  • Learning Rate Decay

  • Gradient Descent

  • AdaDelta (Deep Learning Optimization Algorithm)

  • Adam (Deep Learning Optimization Algorithm)

  • RMSprop (Deep Learning Optimization Algorithm)

  • AdaGrad (Deep Learning Optimization Algorithm)

  • Nesterov momentum (Deep Learning Optimization Algorithm)

  • Challenges with Deep Learning Optimizer Algorithms

  • Adam optimization algorithm

  • Difference between Adam and SGD

  • Logistic regression gradient descent

  • Derivation of the Gradient Descent Formula

  • Mini-Batch Gradient Descent

  • Epoch in Gradient Descent

  • Batch vs Stochastic vs Mini-Batch Gradient Descent

  • Gradient Descent with Momentum

  • For logistic regression, the gradient is given by ∂∂θjJ(θ)=1m∑mi=1(hθ(x(i))−y(i))x(i)j. Which of these is a correct gradient descent update for logistic regression with a learning rate of α?

  • Suppose you have the following training set, and fit a logistic regression classifier hθ(x)=g(θ0+θ1x1+θ2x2)h\theta(x)=g(\theta0+\theta1x1+\theta2x2).

  • Backward Propagation

Learn After
  • Intuition behind Gradient Descent with Momentum

  • These plots were generated with gradient descent; with gradient descent with momentum (β = 0.5) and gradient descent with momentum (β = 0.9). Which curve corresponds to which algorithm?

  • Gradient Descent with Momentum Pseudocode

  • Adam (Deep Learning Optimization Algorithm)