Learn Before
Example
Effect of a Small Learning Rate on Gradient Descent
When the learning rate is chosen to be too small, each gradient descent update moves the parameter only a tiny distance toward the optimum. This results in extremely slow progress, with the algorithm requiring a large number of iterations to reach a satisfactory solution. For instance, applying gradient descent to the quadratic with and starting from , the parameter value is still approximately after iterations—far from the optimal solution at . While a small learning rate ensures that the first-order Taylor approximation remains valid and the function value decreases at every step, the practical cost is an unacceptably slow convergence rate.
0
1
Updated 2026-05-15
Tags
D2L
Dive into Deep Learning @ D2L