1Cademy - Calculating Total MLM Loss for a Sequence

Learn Before

MLM Loss Function as Negative Log-Likelihood

Short Answer

Calculating Total MLM Loss for a Sequence

A language model processes an input sequence with two masked positions. The original tokens for these positions were 'apple' and 'banana'. The model's predicted probabilities for the correct tokens at these positions are P('apple') = 0.5 and P('banana') = 0.2. Using the negative log-likelihood loss function (with a natural logarithm, ln), calculate the total loss for this sequence based only on these two masked positions. Provide your answer rounded to three decimal places.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related