1Cademy - Gradient of RNN Objective with Respect to Output Weights

Learn Before

Backpropagation Through Time (BPTT)

Formula

Gradient of RNN Objective with Respect to Output Weights

For a recurrent neural network, the gradient of the objective function $L$ with respect to the output layer weight parameter $\mathbf{W}_\textrm{qh}$ is calculated by summing the gradients across all time steps $T$ . Because the objective function depends on $\mathbf{W}_\textrm{qh}$ via the sequence of outputs $\mathbf{o}_1, \ldots, \mathbf{o}_T$ , we apply the chain rule to obtain:

$\frac{\partial L}{\partial \mathbf{W}_\textrm{qh}} = \sum_{t=1}^T \textrm{prod}\left(\frac{\partial L}{\partial \mathbf{o}_t}, \frac{\partial \mathbf{o}_t}{\partial \mathbf{W}_\textrm{qh}}\right) = \sum_{t=1}^T \frac{\partial L}{\partial \mathbf{o}_t} \mathbf{h}_t^\top$

where $\mathbf{h}_t^\top$ is the transpose of the hidden state at time step $t$ , and $\partial L/\partial \mathbf{o}_t$ is the gradient of the objective with respect to the model output at that time step.

0

1

Updated 2026-05-14

Contributors are:

Who are from:

University of Michigan - Ann Arbor

References

Learn Before

Related