1Cademy - Gradient of Objective Function with Respect to Intermediate Variable

Learn Before

Formula

Gradient of Objective Function with Respect to Intermediate Variable

Because a neural network's activation function $\phi$ applies elementwise, computing the gradient of the objective function $J$ with respect to the pre-activation intermediate variable $\mathbf{z} \in \mathbb{R}^h$ requires elementwise multiplication (denoted by $\odot$ ). Using the chain rule through the hidden layer output $\mathbf{h}$ , the gradient is calculated as: $\frac{\partial J}{\partial \mathbf{z}} = extrm{prod}\left(\frac{\partial J}{\partial \mathbf{h}}, \frac{\partial \mathbf{h}}{\partial \mathbf{z}} ight) = \frac{\partial J}{\partial \mathbf{h}} \odot \phi'\left(\mathbf{z} ight)$ where $\phi'(\mathbf{z})$ represents the local derivative of the activation function.