1Cademy - Token-Level Loss Calculation in a Backward Pass

Learn Before

Backpropagation

Concept

Token-Level Loss Calculation in a Backward Pass

During the backward pass in training an autoregressive language model, the loss is calculated by comparing the model's predictions to the actual target tokens. A key aspect of this process is that the loss is computed only for the output or target portion of the sequence. For an input sequence like x1, x2, x3 and a target output y1, y2, the loss would be zero for the input tokens. Consequently, the gradients used to update the model's weights originate only from the positions of the target tokens (y1, y2), as these are the positions where a non-zero loss is calculated. These gradients are then propagated backward through the network to adjust the parameters.

0

1

Updated 2026-04-19

Contributors are:

Who are from:

References

Learn Before

Related

Learn After