Formula

Gradient of Objective Function with Respect to Intermediate Variable

Because a neural network's activation function ϕ\phi applies elementwise, computing the gradient of the objective function JJ with respect to the pre-activation intermediate variable zRh\mathbf{z} \in \mathbb{R}^h requires elementwise multiplication (denoted by \odot). Using the chain rule through the hidden layer output h\mathbf{h}, the gradient is calculated as: \frac{\partial J}{\partial \mathbf{z}} = extrm{prod}\left(\frac{\partial J}{\partial \mathbf{h}}, \frac{\partial \mathbf{h}}{\partial \mathbf{z}} ight) = \frac{\partial J}{\partial \mathbf{h}} \odot \phi'\left(\mathbf{z} ight) where ϕ(z)\phi'(\mathbf{z}) represents the local derivative of the activation function.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L