1Cademy - Gradient Descent Convergence on a Scalar Quadratic

Learn Before

One-Dimensional Gradient Descent on a Quadratic

Theory

Gradient Descent Convergence on a Scalar Quadratic

When applying gradient descent to minimize a scalar quadratic function $f(x) = \frac{\lambda}{2} x^2$ , the step-by-step update rule simplifies to $x_{t+1} = x_t - \eta \lambda x_t = (1 - \eta \lambda) x_t$ , where $\eta$ is the learning rate and $\lambda$ represents the curvature. After $t$ iterations, the position is explicitly given by $x_t = (1 - \eta \lambda)^t x_0$ . This demonstrates that the optimization converges exponentially toward the minimum at $x=0$ provided that the condition $|1 - \eta \lambda| < 1$ is met. This inequality shows that the convergence rate improves as $\eta$ increases until $\eta \lambda = 1$ , but if the learning rate is too large such that $\eta \lambda > 2$ , the sequence diverges entirely.

0

1

Updated 2026-06-25

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

Momentum Convergence on a Scalar Quadratic

Learn Before

Related

Learn After