Short Answer

Comparing Model Performance via Loss

A language model is being trained to predict the next word. For a given input, the ideal target probability distribution, PrtPr^t, assigns a probability of 0.6 to the word 'sunny' and 0.3 to the word 'warm'. All other words have a combined probability of 0.1.

Two student models, with parameter sets θA\theta_A and θB\theta_B, produce the following distributions for the same input:

  • Model A (PrθAsPr_{\theta_A}^s): 'sunny' = 0.7, 'warm' = 0.1, others = 0.2
  • Model B (PrθBsPr_{\theta_B}^s): 'sunny' = 0.5, 'warm' = 0.4, others = 0.1

Based on a typical loss function that measures the discrepancy between the entire predicted and target distributions, which model is likely performing better (i.e., would have a lower loss value)? Justify your answer by explaining how the loss function evaluates these distributions.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science