Evaluating Model Performance via Loss
Based on the provided case study, which model (A or B) would have a lower value for the loss function , and why? Your explanation should connect the properties of the probability distributions to the purpose of the loss function in model training.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained to predict the next word in a sequence. For a given training example, the function is used to measure the difference between the model's predicted probability distribution over the vocabulary () and the true distribution (), where the true next word has a probability of 1. If the calculated value of is very high for this example, what does this most accurately indicate?
Evaluating Model Performance via Loss
Comparing Model Predictions via Loss