Selective Gradient Propagation for Sub-sequence Loss
In a practical implementation of back-propagation for a sub-sequence loss, the forward and backward passes behave differently. During the forward pass, the complete sequence, , is constructed normally. However, during the backward pass, error gradients are exclusively propagated back through the portions of the network that correspond to the output sub-sequence, . The rest of the network remains unchanged during this step.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Selective Gradient Propagation for Sub-sequence Loss
A language model's performance on a single training sample is measured by calculating the negative logarithm of the probability it assigns to the correct target output sub-sequence, given an input sequence. Consider two models, Model A and Model B, being evaluated on the same sample. For this sample, Model A assigns a probability of 0.8 to the correct target sub-sequence, while Model B assigns a probability of 0.2. Based on this information, which statement correctly analyzes the models' performance on this specific sample?
Calculating Prediction Loss
Evaluating Model Performance on Different Samples
Selective Gradient Propagation for Sub-sequence Loss
Sample-wise Negative Log-Likelihood Loss for a Sub-sequence
For a supervised fine-tuning task, a single training instance consists of an input segment (
xsample) and a corresponding output segment (ysample). Ifxsampleis 'Instruction: Translate to Spanish. Input: Hello.' andysampleis 'Response: Hola.', which of the following represents the correct structure for the final combined sample that the model will process?Deconstructing a Fine-Tuning Sample
In preparing a data sample for supervised fine-tuning, a common practice is to structure the sample by concatenating the output segment (
ysample) and the input segment (xsample) into a single sequence:sample = [ysample, xsample]. What is the primary reason for placing the output segment before the input segment in this structure?
Learn After
Example of Context and Prediction Sub-sequences
A developer is fine-tuning a language model on a dataset where each entry consists of a context and a desired completion. For training, the context and completion are concatenated into a single input sequence. The training objective is configured so that the loss is calculated only on the model's predictions for the completion part of the sequence. Given this setup, which statement accurately describes how the model's parameters are updated during the backward pass for a single training step?
Debugging a Fine-Tuning Gradient Flow
Implications of Selective Gradient Propagation