Formula

Loss Function for Predicted vs. Gold Probability Distributions

The formula L(pi+1θ,pi+1gold)\mathcal{L}(\mathbf{p}_{i+1}^{\theta}, \mathbf{p}_{i+1}^{\text{gold}}) represents a loss function that quantifies the difference between a model's predicted probability distribution, pi+1θ\mathbf{p}_{i+1}^{\theta}, and the ground truth or "gold" probability distribution, pi+1gold\mathbf{p}_{i+1}^{\text{gold}}. The predicted distribution is parameterized by θ\theta, which are the model's parameters that are updated during training. The goal of training is typically to minimize this loss, thereby making the predicted distribution as close as possible to the true distribution. The subscript i+1i+1 suggests this is often used in sequential contexts, like predicting the next element in a sequence.

Image 0

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences