1Cademy - Comparing Model Performance via Loss

Learn Before

Loss Function for Conditional Probability Distributions ( $Loss(Pr^t(·|·), Pr_θ^s(·|·), x)$ )

Short Answer

Comparing Model Performance via Loss

A language model is being trained to predict the next word. For a given input, the ideal target probability distribution, $Pr^t$ , assigns a probability of 0.6 to the word 'sunny' and 0.3 to the word 'warm'. All other words have a combined probability of 0.1.

Two student models, with parameter sets $\theta_A$ and $\theta_B$ , produce the following distributions for the same input:

Model A ( $Pr_{\theta_A}^s$ ): 'sunny' = 0.7, 'warm' = 0.1, others = 0.2
Model B ( $Pr_{\theta_B}^s$ ): 'sunny' = 0.5, 'warm' = 0.4, others = 0.1

Based on a typical loss function that measures the discrepancy between the entire predicted and target distributions, which model is likely performing better (i.e., would have a lower loss value)? Justify your answer by explaining how the loss function evaluates these distributions.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related