1Cademy - Gradient of Objective Function with Respect to Output Layer Weights

Learn Before

Gradient of Objective Function with Respect to Output Layer Variable

Formula

Gradient of Objective Function with Respect to Output Layer Weights

The gradient of the regularized objective function $J$ with respect to the model parameters closest to the output layer, $\mathbf{W}^{(2)} \in \mathbb{R}^{q \times h}$ , is calculated using the chain rule. It combines the gradients propagated through the output layer variable $\mathbf{o}$ and the explicit gradient of the regularization term $s$ : $\frac{\partial J}{\partial \mathbf{W}^{(2)}}= \textrm{prod}\left(\frac{\partial J}{\partial \mathbf{o}}, \frac{\partial \mathbf{o}}{\partial \mathbf{W}^{(2)}}\right) + \textrm{prod}\left(\frac{\partial J}{\partial s}, \frac{\partial s}{\partial \mathbf{W}^{(2)}}\right)= \frac{\partial J}{\partial \mathbf{o}} \mathbf{h}^\top + \lambda \mathbf{W}^{(2)}$ where $\mathbf{h}^\top$ represents the transpose of the hidden layer activation vector.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related