Vanishing/exploding gradient
In a neural network with many time steps or layers, a gradient at the early layer is the product of all the terms from the later layers, which leads to an inherently unstable situation. Especially when the value of gradient has become so small, it no longer updates properly or is vanished eventually. Exploding gradient can be considered as the opposite of vanishing process. The updated weights using gradient descent become so large that they cause the whole network to become unstable, which leads to numerical overflow.

0
3
Tags
Data Science
D2L
Dive into Deep Learning @ D2L
Related
Vanishing/exploding gradient
RNN Backpropagation Formulas
Helpful Website for BPTT
Teacher forcing
Weight typing
Gradient Descent Reference
Linear Regression and Gradient Descent
Numerical Approximation of Gradients
Gradient Checking
(Batch) Gradient Descent (Deep Learning Optimization Algorithm)
Gradient Descent Explained
Why Gradient descent might fail?
A Chat with Andrew on MLOps: From Model-centric to Data-centric AI
Big Data to Good Data: Andrew Ng Urges ML Community To Be More Data-Centric and Less Model-Centric
MLOps: Data-centric and Model-centric approaches
Critical Points
First-order Optimization Algorithm
Second-order Optimization Algorithm
Method of Steepest Descent
Second-Order Gradient Methods
Gradient Descent Explanation
Gradient Descent Variants
Notes about gradient descent
Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?
Vanishing/exploding gradient
BERT Training Process
Objective Function
Distributed Training
The Problem with Constant Initialization
Effect of Depth for Neural Networks
Vanishing/exploding gradient
Measuring the depth of the model
Example of Weight Initialization
Vanishing/exploding gradient
Symmetry Breaking in Deep Learning
How to Initialization Weights to Prevent Vanishing/Exploding Gradients
Transfer Learning in Deep Learning
Multi-task Learning in Deep Learning
Variance of Layer Output in Forward Propagation
Default Random Initialization
Xavier Initialization
Vanishing/exploding gradient