Learn Before
Concept

Learning Rate

The learning rate η\eta is a positive scalar chosen by the algorithm designer that controls the size of each parameter update step in gradient descent. It directly scales the gradient to determine how far the parameters move in the negative gradient direction at each iteration. Setting η\eta appropriately is critical: a value that is too small results in very slow updates, requiring many more iterations to approach the optimum, while a value that is too large can cause the update step ηf(x)\eta f'(x) to become so large that the first-order Taylor approximation breaks down, potentially causing the iterates to overshoot the minimum and diverge rather than converge.

0

1

Updated 2026-05-18

Tags

Data Science

D2L

Dive into Deep Learning @ D2L