1Cademy - Effect of a Small Learning Rate on Gradient Descent

Learn Before

Learning Rate

Example

Effect of a Small Learning Rate on Gradient Descent

When the learning rate $\eta$ is chosen to be too small, each gradient descent update moves the parameter $x$ only a tiny distance toward the optimum. This results in extremely slow progress, with the algorithm requiring a large number of iterations to reach a satisfactory solution. For instance, applying gradient descent to the quadratic $f(x) = x^2$ with $\eta = 0.05$ and starting from $x = 10$ , the parameter value is still approximately 3.49 after $10$ iterations—far from the optimal solution at $x = 0$ . While a small learning rate ensures that the first-order Taylor approximation remains valid and the function value decreases at every step, the practical cost is an unacceptably slow convergence rate.

Updated 2026-05-15

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

Adaptive Optimization Methods

Learn Before

Related

Learn After