Loss Function for Conditional Probability Distributions ()
This formula represents a generic loss function used to train a model. It calculates the discrepancy between a target conditional probability distribution, denoted as , and a parameterized model's predicted distribution, , for a given input . The goal of training is typically to adjust the parameters to minimize this loss, thereby making the model's distribution as close as possible to the target distribution . This framework is common in tasks like knowledge distillation, where a 'student' model (s) learns from a 'teacher' model (t).

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Loss Function for Conditional Probability Distributions ()
A machine learning team is developing a compact, efficient language model, which we'll call model 's'. The model's behavior is governed by a set of tunable weights, denoted by θ. For a given task, the model receives a simplified context input, c', and a latent variable, z, and then generates a probability distribution over all possible outputs. Which of the following expressions correctly represents this model's output probability distribution?
In the expression , which describes a model's output probability distribution, match each symbol to its correct description.
Applying the Student Model Probability Notation
Learn After
A language model is being trained to predict the next word in a sentence. For the input context 'The sun is shining...', the ideal (target) probability distribution, denoted as , gives a high probability to the word 'brightly'. The model's performance is measured by a loss function that compares the model's predicted probability distribution, , to the target distribution.
Consider two different sets of model parameters, θ₁ and θ₂:
- With parameters θ₁, the model's distribution predicts 'brightly' with a high probability.
- With parameters θ₂, the model's distribution predicts 'darkly' with a high probability.
Which of the following statements correctly analyzes the relationship between the parameters and the loss function for this specific input?
Interpreting a Model's Training Step
Comparing Model Performance via Loss