Concept

Vanishing/exploding gradient

In a neural network with many time steps or layers, a gradient at the early layer is the product of all the terms from the later layers, which leads to an inherently unstable situation. Especially when the value of gradient has become so small, it no longer updates properly or is vanished eventually. Exploding gradient can be considered as the opposite of vanishing process. The updated weights using gradient descent become so large that they cause the whole network to become unstable, which leads to numerical overflow.

Image 0

0

3

Updated 2026-05-06

Tags

Data Science

D2L

Dive into Deep Learning @ D2L

Related