Learn Before
Concept

How to Initialize Weights to Prevent Vanishing/Exploding Gradients

To prevent the gradients of a neural network's activations from vanishing or exploding, weight initialization strategies adhere to two fundamental rules: the mean of the activations should be exactly zero, and their variance must remain constant across all layers. By satisfying these conditions, the backpropagated gradient signal avoids being multiplied by excessively small or large values. Consequently, maintaining a zero mean and constant variance guarantees a stable gradient signal throughout the network.

0

1

Updated 2026-05-15

Tags

Data Science

D2L

Dive into Deep Learning @ D2L