Theory

Gradient Descent Convergence on a Scalar Quadratic

When applying gradient descent to minimize a scalar quadratic function f(x)=λ2x2f(x) = \frac{\lambda}{2} x^2, the step-by-step update rule simplifies to xt+1=xtηλxt=(1ηλ)xtx_{t+1} = x_t - \eta \lambda x_t = (1 - \eta \lambda) x_t, where η\eta is the learning rate and λ\lambda represents the curvature. After tt iterations, the position is explicitly given by xt=(1ηλ)tx0x_t = (1 - \eta \lambda)^t x_0. This demonstrates that the optimization converges exponentially toward the minimum at x=0x=0 provided that the condition 1ηλ<1|1 - \eta \lambda| < 1 is met. This inequality shows that the convergence rate improves as η\eta increases until ηλ=1\eta \lambda = 1, but if the learning rate is too large such that ηλ>2\eta \lambda > 2, the sequence diverges entirely.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L