Formula
Gradient of Objective Function with Respect to Hidden Layer Weights
Finally, the gradient of the objective function with respect to the model parameters closest to the input layer, , is calculated. The chain rule combines the gradient propagated backward to the intermediate variable with the explicit gradient from the regularization term : Here, is the transpose of the initial input feature vector.
0
1
Updated 2026-05-06
Tags
D2L
Dive into Deep Learning @ D2L