Short Answer

Comparing Model Predictions via Loss

A language model is tasked with predicting the next word in the sentence 'The cat sat on the ___'. The correct next word is 'mat'. Two different models, Model A and Model B, produce the following probability distributions for the next word:

  • Model A: P('mat') = 0.8, P('rug') = 0.1, P('floor') = 0.05, P('chair') = 0.05
  • Model B: P('mat') = 0.2, P('rug') = 0.3, P('floor') = 0.3, P('chair') = 0.2

Based on the principle of minimizing the difference between the predicted and the true probability distribution, which model is performing better on this specific example? Explain your reasoning.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science