1Cademy - Vanishing/exploding gradient

Learn Before

Backpropagation Through Time (BPTT)
Gradient Descent
Depth and Width for Neural Networks
Deep Learning Weight Initialization
Overflow and Underflow in Computer Science
Vexing Optimization Challenges in Deep Learning

Concept

Vanishing/exploding gradient

In a neural network with many time steps or layers, a gradient at the early layer is the product of all the terms from the later layers, which leads to an inherently unstable situation. Especially when the value of gradient has become so small, it no longer updates properly or is vanished eventually. Exploding gradient can be considered as the opposite of vanishing process. The updated weights using gradient descent become so large that they cause the whole network to become unstable, which leads to numerical overflow.