Learn Before
Concept

Mathematical Mechanism of Vanishing and Exploding Gradients in Recurrent Neural Networks

Vanishing and exploding gradients are common problems in recurrent neural networks. Consider a network where an input is multiplied by a weight matrix W\mathbf{W} for tt time steps. Let Wt\mathbf{W}^t have the eigendecomposition VΛtV1\mathbf{V} \Lambda^t \mathbf{V}^{-1}, where Λ\Lambda is a diagonal matrix of eigenvalues. We can see that if an eigenvalue λ>1\lambda > 1, the result will approach \infty as tt gets large, leading to an exploding gradient. Conversely, if λ<1\lambda < 1, the result will approach 00 as tt gets large, resulting in a vanishing gradient.

0

2

Updated 2026-05-18

References


Tags

Data Science