Learn Before
Concept
Reason for Gradient Clipping
RNNs may compute strongly nonlinear functions over many timesteps, leading to derivatives with either very small (vanishing) or very large (exploding) magnitudes. One example of this is the objective function, whose "landscape" creates "cliffs" in the function space. This makes finding an acceptable step size (learning rate) for gradient descent very difficult.
0
1
Updated 2021-07-29
Tags
Data Science