Learn Before
Formula

Gradient of RNN Objective with Respect to Output Weights

For a recurrent neural network, the gradient of the objective function LL with respect to the output layer weight parameter Wqh\mathbf{W}_\textrm{qh} is calculated by summing the gradients across all time steps TT. Because the objective function depends on Wqh\mathbf{W}_\textrm{qh} via the sequence of outputs o1,,oT\mathbf{o}_1, \ldots, \mathbf{o}_T, we apply the chain rule to obtain:

LWqh=t=1Tprod(Lot,otWqh)=t=1TLotht\frac{\partial L}{\partial \mathbf{W}_\textrm{qh}} = \sum_{t=1}^T \textrm{prod}\left(\frac{\partial L}{\partial \mathbf{o}_t}, \frac{\partial \mathbf{o}_t}{\partial \mathbf{W}_\textrm{qh}}\right) = \sum_{t=1}^T \frac{\partial L}{\partial \mathbf{o}_t} \mathbf{h}_t^\top

where ht\mathbf{h}_t^\top is the transpose of the hidden state at time step tt, and L/ot\partial L/\partial \mathbf{o}_t is the gradient of the objective with respect to the model output at that time step.

0

1

Updated 2026-05-14

Tags

Data Science

D2L

Dive into Deep Learning @ D2L