Learn Before
Concept

One-Dimensional Gradient Descent

One-dimensional gradient descent provides a clear illustration of why moving in the negative gradient direction reduces the objective function. For a continuously differentiable function f:RightarrowRf: \mathbb{R} ightarrow \mathbb{R}, the first-order Taylor expansion gives f(x+ϵ)=f(x)+ϵf(x)+O(ϵ2)f(x + \epsilon) = f(x) + \epsilon f'(x) + \mathcal{O}(\epsilon^2). Setting the step as ϵ=ηf(x)\epsilon = -\eta f'(x), where η>0\eta > 0 is a fixed learning rate, yields f(xηf(x))=f(x)ηf2(x)+O(η2f2(x))f(x - \eta f'(x)) = f(x) - \eta f'^2(x) + \mathcal{O}(\eta^2 f'^2(x)). When the derivative f(x)eq0f'(x) eq 0, the term ηf2(x)>0\eta f'^2(x) > 0 guarantees a decrease in ff, provided η\eta is small enough for the higher-order terms to be negligible. This leads to the update rule xxηf(x)x \leftarrow x - \eta f'(x), which is applied iteratively from an initial value until a stopping condition is met, such as when the gradient magnitude f(x)|f'(x)| becomes sufficiently small or a maximum number of iterations is reached.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L