Derivation of the Gradient Descent Formula
The gradient of a scalar function is defined as the unique vector field whose dot product with any vector at each point is the directional derivative of along . That is,
The directional derivative in direction (a unit vector) is the slope of the function in direction , namely the rate of increase of per unit of distance moved in the direction given by .
To minimize , we would like to find the direction in which decreases the fastest. We can do this using the directional derivative: where θ is the angle between and the gradient. Substituting in and ignoring factors that do not depend on , this simplifies to .
This is minimized when points in the opposite direction as the gradient. In otherwords, the gradient points directly uphill, and the negative gradient points directly down hill. We can decrease by moving in the direction of the negative gradient.
Hence we have where is the learning rate, a positive scalar determining the size of the step.
0
1
Tags
Data Science
Related
Logistic regression gradient descent
Derivation of the Gradient Descent Formula
Mini-Batch Gradient Descent
Epoch in Gradient Descent
Batch vs Stochastic vs Mini-Batch Gradient Descent
Gradient Descent with Momentum
For logistic regression, the gradient is given by ∂∂θjJ(θ)=1m∑mi=1(hθ(x(i))−y(i))x(i)j. Which of these is a correct gradient descent update for logistic regression with a learning rate of α?
Suppose you have the following training set, and fit a logistic regression classifier .
Backpropagation