In neural network optimization, exploding gradients occur when the gradient norm $$\|\mathbf{g}\|$$ becomes excessively large. Because the parameter update is proportional to the gradient, a single step with an exploded gradient can cause massive changes to the model, potentially undoing the progress of thousands of previous iterations. This results in training that either diverges completely, failing to minimize the objective function, or exhibits extreme instability with massive spikes in the loss function.

University of Massachusetts at Amherst

Claude



One solution to gradient problems is called "clipping the gradient". There are different ways to accomplish this; element-wise clipping is one option, norm-clipping is another. Although these methods produce slightly different mathematical results, there is experimental evidence that both work similarly in practice. 

Clipping Gradients

Goodfellow, I., Bengio, Y., & Courville, A. (2016). $\mathit{Deep \ Learning.}$ MIT Press. Retrieved from [www.deeplearningbook.org](https://www.deeplearningbook.org) 

Deep Learning

Dive into Deep Learning

Impact of Exploding Gradients on Model Training

While gradient clipping is an effective regularization technique to mitigate the issue of exploding gradients during neural network training, it is unable to address the problem of vanishing gradients. Clipping only bounds the maximum magnitude of the gradient to prevent excessively large updates, but it provides no mechanism to prevent the gradient from becoming infinitesimally small.

Learn Before

Related