1Cademy - A language model is given an input sequence where one token has been replaced by a [MASK] token. The original, correct token for that position was fox. After processing the input, the model outputs the following probability distribution for the masked position: * P(fox) = 0.7 * P(cat) = 0.2 * P(dog) = 0.1 If the training objective for this single token is to minimize the negative natural logarithm of the probability of the correct token, what is the calculated loss value for this instance? (Use ln for natural logarithm)

Learn Before

MLM Loss Function as Negative Log-Likelihood

Multiple Choice

A language model is given an input sequence where one token has been replaced by a [MASK] token. The original, correct token for that position was 'fox'. After processing the input, the model outputs the following probability distribution for the masked position:

P('fox') = 0.7
P('cat') = 0.2
P('dog') = 0.1

If the training objective for this single token is to minimize the negative natural logarithm of the probability of the correct token, what is the calculated loss value for this instance? (Use ln for natural logarithm)

Updated 2025-09-26

Contributors are: