Learn Before
Concept
Escaping Local Minima
Because gradients approach zero at a local minimum, an optimization algorithm can become trapped. However, introducing some degree of noise can knock the model's parameters out of these sub-optimal valleys. One of the beneficial properties of minibatch stochastic gradient descent is that the natural statistical variation of gradients evaluated over different minibatches inherently provides this necessary noise, successfully dislodging the parameters from local minima.
0
1
Updated 2026-05-15
Tags
D2L
Dive into Deep Learning @ D2L