Learn Before
Concept
Learning Rate
The learning rate is a positive scalar chosen by the algorithm designer that controls the size of each parameter update step in gradient descent. It directly scales the gradient to determine how far the parameters move in the negative gradient direction at each iteration. Setting appropriately is critical: a value that is too small results in very slow updates, requiring many more iterations to approach the optimum, while a value that is too large can cause the update step to become so large that the first-order Taylor approximation breaks down, potentially causing the iterates to overshoot the minimum and diverge rather than converge.
0
1
Updated 2026-05-18
Contributors are:
Who are from:
Tags
Data Science
D2L
Dive into Deep Learning @ D2L