Learn Before
A masked language model processes the input 'The chef carefully seasoned the [MASK] before serving.' For the masked position, the model generates a probability distribution over its entire 30,000-word vocabulary. The word 'soup' is assigned a probability of 0.6, 'dish' is assigned 0.2, and the remaining probability is spread thinly across the other 29,998 words. If the original, unmasked word was 'soup', which of the following statements provides the most accurate analysis of this outcome?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A masked language model processes the input 'The chef carefully seasoned the [MASK] before serving.' For the masked position, the model generates a probability distribution over its entire 30,000-word vocabulary. The word 'soup' is assigned a probability of 0.6, 'dish' is assigned 0.2, and the remaining probability is spread thinly across the other 29,998 words. If the original, unmasked word was 'soup', which of the following statements provides the most accurate analysis of this outcome?
Interpreting a Model's Output Distribution
A language model with a small vocabulary consisting of only four words ('cat', 'sat', 'on', 'mat') is given the input sequence 'the [MASK] sat on the mat'. The model's task is to predict the masked token. Which of the following options represents a valid predicted probability distribution for the masked position?