1Cademy - A language model is being trained to fill in a masked word. For the input The cat sat on the [MASK], the correct word is mat. The training objective is to adjust the model to minimize the cross-entropy loss for its predictions. Below are four different potential outputs from the model, showing the probability it assigns to the word mat. Which of these outputs would result in the LOWEST loss for this specific training example?

Learn Before

MLM Training Objective using Cross-Entropy Loss

Multiple Choice

A language model is being trained to fill in a masked word. For the input 'The cat sat on the [MASK]', the correct word is 'mat'. The training objective is to adjust the model to minimize the cross-entropy loss for its predictions. Below are four different potential outputs from the model, showing the probability it assigns to the word 'mat'. Which of these outputs would result in the LOWEST loss for this specific training example?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related