Learn Before
Concept

When are exploding/vanishing gradients a problem?

Exploding/vanshing gradients is a common problem in reccurrent neural networks. Consider a network where an input is multiplied by a matrix W\bold{W} tt times. Let Wt\bold{W}^t have eigendecomposition Vdiag(λ)tV1\bold{V}\text{diag}(\bold{\lambda})^t\bold{V}^{-1}.

We can see that if λ>1\lambda>1, our result will approach \infty as tt gets large. This is an exploding gradient. Also, if λ<1\lambda <1, the result will approach $0asast$ gets large. This is a vanishing gradient.

0

2

Updated 2021-06-24

References


Tags

Data Science