Evaluating Model Performance on Different Samples
A language model is being fine-tuned on two different training samples. The model's goal is to minimize the negative log-likelihood loss for the correct output sub-sequence. Analyze the two scenarios below and determine on which sample the model performed worse. Justify your answer based on how the loss is calculated.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.3 Prompting - Foundations of Large Language Models
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Selective Gradient Propagation for Sub-sequence Loss
A language model's performance on a single training sample is measured by calculating the negative logarithm of the probability it assigns to the correct target output sub-sequence, given an input sequence. Consider two models, Model A and Model B, being evaluated on the same sample. For this sample, Model A assigns a probability of 0.8 to the correct target sub-sequence, while Model B assigns a probability of 0.2. Based on this information, which statement correctly analyzes the models' performance on this specific sample?
Calculating Prediction Loss
Evaluating Model Performance on Different Samples