Deep Learning Weight Initialization
Initial weights are applied to all the neurons. It is necessary to set initial weights for the first forward pass. Two basic options are to set weights to zero or to randomize them. However, this can result in a vanishing or exploding gradient, which will make it difficult to train the model. To mitigate this problem, you can use a heuristic (a formula tied to the number of neuron layers) to determine the weights. A common heuristic used for the Tanh activation is called Xavier initialization.
On top of that if the architecture being used has already been trained by someone else (such as ImageNet) the node values can be initialized to those values. This is known as transfer learning and it can be very useful to substantially increase training speed and model accuracy.
0
2
Tags
Data Science
D2L
Dive into Deep Learning @ D2L
Related
Forward Propagation
Update Weight Iteratively Until Convergence
Deep Learning Weight Initialization
What is the "cache" used for in our implementation of forward propagation and backward propagation?
Consider the following 1 hidden layer neural network:
Which of the following are true regarding activation outputs and vectors? (Check all that apply.)
Backpropagation
Objective Function
Depth and Width for Neural Networks
Dropout
Neural Network Learning Rate
Epochs in Machine Learning
Activation Functions in Neural Networks
Deep Learning Optimizer Algorithms
Batch Normalization in Deep Learning
Deep Learning Weight Initialization
Hyperparameters Tuning Methods in Deep Learning
Difference between Model Parameter and Model Hyperparameter
Regularization Constant
Learn After
Example of Weight Initialization
Vanishing/exploding gradient
Symmetry Breaking in Deep Learning
How to Initialization Weights to Prevent Vanishing/Exploding Gradients
Transfer Learning in Deep Learning
Multi-task Learning in Deep Learning
Variance of Layer Output in Forward Propagation
Default Random Initialization
Xavier Initialization