Formula
Gradient of Objective Function with Respect to Output Layer Weights
The gradient of the regularized objective function with respect to the model parameters closest to the output layer, , is calculated using the chain rule. It combines the gradients propagated through the output layer variable and the explicit gradient of the regularization term : where represents the transpose of the hidden layer activation vector.
0
1
Updated 2026-05-06
Tags
D2L
Dive into Deep Learning @ D2L