Comparing Model Performance via Loss
A language model is being trained to predict the next word. For a given input, the ideal target probability distribution, , assigns a probability of 0.6 to the word 'sunny' and 0.3 to the word 'warm'. All other words have a combined probability of 0.1.
Two student models, with parameter sets and , produce the following distributions for the same input:
- Model A (): 'sunny' = 0.7, 'warm' = 0.1, others = 0.2
- Model B (): 'sunny' = 0.5, 'warm' = 0.4, others = 0.1
Based on a typical loss function that measures the discrepancy between the entire predicted and target distributions, which model is likely performing better (i.e., would have a lower loss value)? Justify your answer by explaining how the loss function evaluates these distributions.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained to predict the next word in a sentence. For the input context 'The sun is shining...', the ideal (target) probability distribution, denoted as , gives a high probability to the word 'brightly'. The model's performance is measured by a loss function that compares the model's predicted probability distribution, , to the target distribution.
Consider two different sets of model parameters, θ₁ and θ₂:
- With parameters θ₁, the model's distribution predicts 'brightly' with a high probability.
- With parameters θ₂, the model's distribution predicts 'darkly' with a high probability.
Which of the following statements correctly analyzes the relationship between the parameters and the loss function for this specific input?
Interpreting a Model's Training Step
Comparing Model Performance via Loss