Learn Before
Variance of Layer Output in Forward Propagation
When analyzing the scale distribution of an output for a fully connected layer without nonlinearities, the output is computed as . Assuming the inputs and weights are drawn independently with a mean of and variances of and respectively, the expected value is . We can compute the variance as . Note that the distribution does not have to be Gaussian, but the mean and variance must exist. To keep this variance fixed during forward propagation and prevent it from changing across layers, the initialization must satisfy the condition .
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Example of Weight Initialization
Vanishing/exploding gradient
Symmetry Breaking in Deep Learning
Transfer Learning in Deep Learning
Multi-task Learning in Deep Learning
Variance of Layer Output in Forward Propagation
Default Random Initialization
Xavier Initialization
Built-in Gaussian Parameter Initialization
Constant Parameter Initialization
Block-Specific Parameter Initialization
Forced Parameter Reinitialization
Custom Parameter Initialization
Direct Parameter Assignment
Lazy Parameter Initialization
How to Initialize Weights to Prevent Vanishing/Exploding Gradients