Learn Before
Concept
Batch Norm in Deep Learning Implementation
In each layer, before we put our input to the activation function, we first normalize it by Batch Norm. That is, we first calculate We introduce and because we don't want all the inputs to neurons in hidden layers to always have mean 0 and variance 1. , where g is some activation function. Recall in each layer, When we calculate , we first normalize by subtracting the mean. So the value of has no influence on the result. In each layer, we have three parameters . Frequently, we combine this with mini-batches; i.e., we train , , with respect to , where is batch of the training data.
0
2
Updated 2020-11-30
Contributors are:
Who are from:
Tags
Data Science