Learn Before
Impact of Exploding Gradients on Model Training
When gradients explode during neural network training, the gradient norm becomes excessively large. In such worst-case scenarios, a single gradient step can undo the progress made over the course of thousands of training iterations. Consequently, training often diverges, entirely failing to reduce the value of the objective function. Even in cases where training eventually converges, the process remains highly unstable due to massive spikes in the loss.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Solutions for vanishing/exploding gradient
A Gentle Introduction to Exploding Gradients in Neural Networks
Zero Weight Initialization in Feed-Forward Networks
Impact of Exploding Gradients on Model Training
Vanishing Gradient of the Tanh Activation Function
Reparametrization to Mitigate Stalling Optimization
Mathematical Mechanism of Vanishing and Exploding Gradients in Recurrent Neural Networks