Multiple Choice

A language model is given an input sequence where one token has been replaced by a [MASK] token. The original, correct token for that position was 'fox'. After processing the input, the model outputs the following probability distribution for the masked position:

  • P('fox') = 0.7
  • P('cat') = 0.2
  • P('dog') = 0.1

If the training objective for this single token is to minimize the negative natural logarithm of the probability of the correct token, what is the calculated loss value for this instance? (Use ln for natural logarithm)

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science