Formula
Gradient of Objective Function with Respect to Intermediate Variable
Because a neural network's activation function applies elementwise, computing the gradient of the objective function with respect to the pre-activation intermediate variable requires elementwise multiplication (denoted by ). Using the chain rule through the hidden layer output , the gradient is calculated as: \frac{\partial J}{\partial \mathbf{z}} = extrm{prod}\left(\frac{\partial J}{\partial \mathbf{h}}, \frac{\partial \mathbf{h}}{\partial \mathbf{z}} ight) = \frac{\partial J}{\partial \mathbf{h}} \odot \phi'\left(\mathbf{z} ight) where represents the local derivative of the activation function.
0
1
Updated 2026-05-06
Tags
D2L
Dive into Deep Learning @ D2L