Concept

Gradient Descent Batch Norm in Deep Learning Implementation

For t=1,..., n mini-batches,          Compute forward propagation on X{i}X^{\{i\}}                  In each hidden layer, use Batch Norm to replace z(l)z^{(l)} with z~(l)\tilde{z}^{(l)}           Use back propagation to compute dW(l),dγ(l), dβ(l)dW^{(l)}, d\gamma^{(l)},  d\beta^{(l)}           Update parameters:                    W(l)=W(l)dW(l)W^{(l)} = W^{(l)} - dW^{(l)}                    γ(l)=γ(l)dγ(l)\gamma^{(l)} = \gamma^{(l)} - d\gamma^{(l)}                    β(l)=β(l)dβ(l)\beta^{(l)} = \beta^{(l)} - d\beta^{(l)}

0

2

Updated 2020-11-16

Tags

Data Science