1Cademy - Back Propagation Example

Learn Before

Backward Propagation Formulation

Concept

Back Propagation Example

Input for a 2 layer MLP (Multi Layer Perceptron) is given as $X$ , the output is given as $Y$

Thus, there are 2 parameter matrices $W^{(1)}$ and $W^{(2)}$ for layers 1 and 2 respectively.

Layer 1 also has a hidden "relu" feature such that the output from layer 1 is constrained by $H = max\{0, XW^{(1)}\}$

The net or total cost function $J$ is given my the cross-entropy cost $J_{MLE}$ added with a regularization term $\lambda(\sum_{i j}^{}(W^{(1)}_{ij})^2 + \sum_{i j}^{}(W^{(2)}_{ij})^2)$

$J = J_{MLE} + \lambda(\sum_{i j}^{}(W^{(1)}_{ij})^2 + \sum_{i j}^{}(W^{(2)}_{ij})^2)$

This produces the following computational graph image shown below.

Compute $\triangledown_{W^{(1)}}{J}$ and $\triangledown_{W^{(2)}}{J}$

Back Propagation on this example is obviously simple on the weight decay side, but not so much on the cross-entropy side.

Let $G = U^{(2)}$

Gradient 1: $g_1 = H^TG$ Gradient 2: $g_2 = \triangledown_{H}J = GW^{(2)T}$ Gradient 3: $g_3 = back\_prop\_relu(H, g_2)$ Gradient 4: $g_4 = X^Tg_3$

Add $g_1$ and $g_4$ gradients to the gradients of $W^{(1)}$ and $W^{(2)}$ respectively (the values calculated from weight decay + the back propagated gradients). This results in the answers for $\triangledown_{W^{(1)}}{J}$ and $\triangledown_{W^{(2)}}{J}$ .

0

1

Updated 2021-06-17

Contributors are:

JM

Who are from:

References

Deep Learning

Learn Before

Related