1Cademy - Gradient of Objective Function with Respect to Hidden Layer Weights

Learn Before

Gradient of Objective Function with Respect to Intermediate Variable

Formula

Gradient of Objective Function with Respect to Hidden Layer Weights

Finally, the gradient of the objective function $J$ with respect to the model parameters closest to the input layer, $\mathbf{W}^{(1)} \in \mathbb{R}^{h \times d}$ , is calculated. The chain rule combines the gradient propagated backward to the intermediate variable $\mathbf{z}$ with the explicit gradient from the regularization term $s$ : $\frac{\partial J}{\partial \mathbf{W}^{(1)}} = \textrm{prod}\left(\frac{\partial J}{\partial \mathbf{z}}, \frac{\partial \mathbf{z}}{\partial \mathbf{W}^{(1)}}\right) + \textrm{prod}\left(\frac{\partial J}{\partial s}, \frac{\partial s}{\partial \mathbf{W}^{(1)}}\right) = \frac{\partial J}{\partial \mathbf{z}} \mathbf{x}^\top + \lambda \mathbf{W}^{(1)}$ Here, $\mathbf{x}^\top$ is the transpose of the initial input feature vector.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related