Learn Before
Concept

Impact of Exploding Gradients on Model Training

When gradients explode during neural network training, the gradient norm g\|\mathbf{g}\| becomes excessively large. In such worst-case scenarios, a single gradient step can undo the progress made over the course of thousands of training iterations. Consequently, training often diverges, entirely failing to reduce the value of the objective function. Even in cases where training eventually converges, the process remains highly unstable due to massive spikes in the loss.

0

1

Updated 2026-05-14

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L