Sample-wise Negative Log-Likelihood Loss for a Sub-sequence
When evaluating a model on a specific training instance, the loss function is calculated solely over the target sub-sequence, , instead of the full sequence. For a model defined by parameters , this loss is expressed as the negative log-likelihood of the probability of generating the output sub-sequence, given the input sub-sequence . The formula is:
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.3 Prompting - Foundations of Large Language Models
Related
Sample-wise Negative Log-Likelihood Loss for a Sub-sequence
Sequence-Level Loss
An engineer is training a model on a large dataset. They are monitoring two metrics:
- Metric A: A value calculated for each individual data sample. This value fluctuates significantly from one sample to the next.
- Metric B: A single, aggregate value calculated after the model has processed the entire training dataset. This value shows a steady, downward trend over multiple passes through the dataset.
Based on the standard terminology for measuring a model's performance, what is the most accurate way to classify these two metrics?
Interpreting Training Metrics
Match each term to its most accurate description regarding how a model's performance is measured during training.
Loss Function for RNN
Sample-wise Negative Log-Likelihood Loss for a Sub-sequence
Cross-Entropy Loss for Knowledge Distillation
A language model is being trained to generate the four-word sentence 'The quick brown fox'. The model generates one word at a time, and the error (loss) is calculated at each step:
- Loss for 'The' = 0.1
- Loss for 'quick' = 0.3
- Loss for 'brown' = 0.2
- Loss for 'fox' = 0.4
To update the model's parameters, the training process computes a single, overall loss value for the entire sentence. Which statement best analyzes this method of calculating the overall loss?
Total Loss Calculation for a Token Sequence
Calculating Average Sequence-Level Loss
Evaluating Training Strategies for a Translation Model
Selective Gradient Propagation for Sub-sequence Loss
Sample-wise Negative Log-Likelihood Loss for a Sub-sequence
For a supervised fine-tuning task, a single training instance consists of an input segment (
xsample) and a corresponding output segment (ysample). Ifxsampleis 'Instruction: Translate to Spanish. Input: Hello.' andysampleis 'Response: Hola.', which of the following represents the correct structure for the final combined sample that the model will process?Deconstructing a Fine-Tuning Sample
In preparing a data sample for supervised fine-tuning, a common practice is to structure the sample by concatenating the output segment (
ysample) and the input segment (xsample) into a single sequence:sample = [ysample, xsample]. What is the primary reason for placing the output segment before the input segment in this structure?
Learn After
Selective Gradient Propagation for Sub-sequence Loss
A language model's performance on a single training sample is measured by calculating the negative logarithm of the probability it assigns to the correct target output sub-sequence, given an input sequence. Consider two models, Model A and Model B, being evaluated on the same sample. For this sample, Model A assigns a probability of 0.8 to the correct target sub-sequence, while Model B assigns a probability of 0.2. Based on this information, which statement correctly analyzes the models' performance on this specific sample?
Calculating Prediction Loss
Evaluating Model Performance on Different Samples