Case Study

Evaluating Model Performance via Cross-Entropy Loss

A language model is being trained to predict a masked token. For a specific training instance, the correct token is 'river'. Two different models, Model A and Model B, produce the probability distributions shown below for the masked position. Based on the goal of minimizing cross-entropy loss, which model is performing better on this specific instance? Justify your answer by explaining how the loss is calculated in this scenario.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science