Learn Before
Concept

Vanishing Gradient of the Tanh Activation Function

When minimizing an objective function using the hyperbolic tangent (anh anh) activation function, optimization can stall due to the vanishing gradient problem. For example, if an algorithm attempts to minimize f(x)=anh(x)f(x) = anh(x) starting at x=4x = 4, the gradient is extremely small. Since the derivative is f(x)=1anh2(x)f'(x) = 1 - anh^2(x), the gradient evaluates to f(4)=0.0013f'(4) = 0.0013. Consequently, the optimization process gets stuck and makes negligible progress for a long time. This severe saturation issue is one of the primary reasons training deep learning models was notoriously tricky before the widespread adoption of the ReLU activation function.

Image 0

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L