Learn Before
Variance of Layer Output in Forward Propagation
When analyzing the scale distribution of an output for a fully connected layer without nonlinearities, the output is computed as . Assuming the inputs and weights are drawn independently with a mean of and variances of and respectively, the expected value is . We can compute the variance as . Note that the distribution does not have to be Gaussian, but the mean and variance must exist. To keep this variance fixed during forward propagation and prevent it from changing across layers, the initialization must satisfy the condition .
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Example of Weight Initialization
Vanishing/exploding gradient
Symmetry Breaking in Deep Learning
How to Initialization Weights to Prevent Vanishing/Exploding Gradients
Transfer Learning in Deep Learning
Multi-task Learning in Deep Learning
Variance of Layer Output in Forward Propagation
Default Random Initialization
Xavier Initialization