1Cademy - Evaluating Model Performance via Cross-Entropy Loss

Learn Before

MLM Training Objective using Cross-Entropy Loss

Case Study

Evaluating Model Performance via Cross-Entropy Loss

A language model is being trained to predict a masked token. For a specific training instance, the correct token is 'river'. Two different models, Model A and Model B, produce the probability distributions shown below for the masked position. Based on the goal of minimizing cross-entropy loss, which model is performing better on this specific instance? Justify your answer by explaining how the loss is calculated in this scenario.

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related