Formula

Gradient of Objective Function with Respect to Hidden Layer Weights

Finally, the gradient of the objective function JJ with respect to the model parameters closest to the input layer, W(1)Rh×d\mathbf{W}^{(1)} \in \mathbb{R}^{h \times d}, is calculated. The chain rule combines the gradient propagated backward to the intermediate variable z\mathbf{z} with the explicit gradient from the regularization term ss: JW(1)=prod(Jz,zW(1))+prod(Js,sW(1))=Jzx+λW(1)\frac{\partial J}{\partial \mathbf{W}^{(1)}} = \textrm{prod}\left(\frac{\partial J}{\partial \mathbf{z}}, \frac{\partial \mathbf{z}}{\partial \mathbf{W}^{(1)}}\right) + \textrm{prod}\left(\frac{\partial J}{\partial s}, \frac{\partial s}{\partial \mathbf{W}^{(1)}}\right) = \frac{\partial J}{\partial \mathbf{z}} \mathbf{x}^\top + \lambda \mathbf{W}^{(1)} Here, x\mathbf{x}^\top is the transpose of the initial input feature vector.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L