Learn Before
Loss Function for Predicted vs. Gold Probability Distributions
The formula represents a loss function that quantifies the difference between a model's predicted probability distribution, , and the ground truth or "gold" probability distribution, . The predicted distribution is parameterized by , which are the model's parameters that are updated during training. The goal of training is typically to minimize this loss, thereby making the predicted distribution as close as possible to the true distribution. The subscript suggests this is often used in sequential contexts, like predicting the next element in a sequence.

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Learn After
A language model is being trained to predict the next word in a sequence. For a given training example, the function is used to measure the difference between the model's predicted probability distribution over the vocabulary () and the true distribution (), where the true next word has a probability of 1. If the calculated value of is very high for this example, what does this most accurately indicate?
Evaluating Model Performance via Loss
Comparing Model Predictions via Loss