1Cademy - Vanishing Gradient of the Tanh Activation Function

Learn Before

Vanishing/exploding gradient

Concept

Vanishing Gradient of the Tanh Activation Function

When minimizing an objective function using the hyperbolic tangent ( $anh$ ) activation function, optimization can stall due to the vanishing gradient problem. For example, if an algorithm attempts to minimize $f(x) = anh(x)$ starting at $x = 4$ , the gradient is extremely small. Since the derivative is $f'(x) = 1 - anh^2(x)$ , the gradient evaluates to $f'(4) = 0.0013$ . Consequently, the optimization process gets stuck and makes negligible progress for a long time. This severe saturation issue is one of the primary reasons training deep learning models was notoriously tricky before the widespread adoption of the ReLU activation function.

Updated 2026-05-15

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related