A language model is being trained to predict the next word in a sequence. For a given training example, the function is used to measure the difference between the model's predicted probability distribution over the vocabulary () and the true distribution (), where the true next word has a probability of 1. If the calculated value of is very high for this example, what does this most accurately indicate?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained to predict the next word in a sequence. For a given training example, the function is used to measure the difference between the model's predicted probability distribution over the vocabulary () and the true distribution (), where the true next word has a probability of 1. If the calculated value of is very high for this example, what does this most accurately indicate?
Evaluating Model Performance via Loss
Comparing Model Predictions via Loss