Learn Before
Concept

Reason for Gradient Clipping

RNNs may compute strongly nonlinear functions over many timesteps, leading to derivatives with either very small (vanishing) or very large (exploding) magnitudes. One example of this is the objective function, whose "landscape" creates "cliffs" in the function space. This makes finding an acceptable step size (learning rate) for gradient descent very difficult.

0

1

Updated 2021-07-29

References


Tags

Data Science

Related