Learn Before
Concept

Escaping Local Minima

Because gradients approach zero at a local minimum, an optimization algorithm can become trapped. However, introducing some degree of noise can knock the model's parameters out of these sub-optimal valleys. One of the beneficial properties of minibatch stochastic gradient descent is that the natural statistical variation of gradients evaluated over different minibatches inherently provides this necessary noise, successfully dislodging the parameters from local minima.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L