Learn Before
Theory
Gradient Descent Convergence on a Scalar Quadratic
When applying gradient descent to minimize a scalar quadratic function , the step-by-step update rule simplifies to , where is the learning rate and represents the curvature. After iterations, the position is explicitly given by . This demonstrates that the optimization converges exponentially toward the minimum at provided that the condition is met. This inequality shows that the convergence rate improves as increases until , but if the learning rate is too large such that , the sequence diverges entirely.
0
1
Updated 2026-05-15
Tags
D2L
Dive into Deep Learning @ D2L