Learn Before
Calculating Total MLM Loss for a Sequence
A language model processes an input sequence with two masked positions. The original tokens for these positions were 'apple' and 'banana'. The model's predicted probabilities for the correct tokens at these positions are P('apple') = 0.5 and P('banana') = 0.2. Using the negative log-likelihood loss function (with a natural logarithm, ln), calculate the total loss for this sequence based only on these two masked positions. Provide your answer rounded to three decimal places.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is given an input sequence where one token has been replaced by a [MASK] token. The original, correct token for that position was 'fox'. After processing the input, the model outputs the following probability distribution for the masked position:
- P('fox') = 0.7
- P('cat') = 0.2
- P('dog') = 0.1
If the training objective for this single token is to minimize the negative natural logarithm of the probability of the correct token, what is the calculated loss value for this instance? (Use ln for natural logarithm)
Two language models, Model A and Model B, are tasked with predicting a masked token in a sentence. The correct, original token is 'river'.
Model A's predicted probabilities for the masked position include:
- P('river') = 0.3
- P('stream') = 0.4
- P('water') = 0.2
Model B's predicted probabilities for the masked position include:
- P('river') = 0.01
- P('mountain') = 0.95
- P('sky') = 0.02
Based on the standard negative log-likelihood loss function used for this task, which statement accurately compares the calculated loss for this single prediction?
Calculating Total MLM Loss for a Sequence
Running Example of Computing MLM Loss