Activity (Process)

Selective Gradient Propagation for Sub-sequence Loss

In a practical implementation of back-propagation for a sub-sequence loss, the forward and backward passes behave differently. During the forward pass, the complete sequence, [ysample,xsample][\mathbf{y}_{\mathrm{sample}},\mathbf{x}_{\mathrm{sample}}], is constructed normally. However, during the backward pass, error gradients are exclusively propagated back through the portions of the network that correspond to the output sub-sequence, ysample\mathbf{y}_{\mathrm{sample}}. The rest of the network remains unchanged during this step.

Image 0

0

1

Updated 2026-04-19

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related