1Cademy - A masked language model processes the input The chef carefully seasoned the [MASK] before serving. For the masked position, the model generates a probability distribution over its entire 30,000-word vocabulary. The word soup is assigned a probability of 0.6, dish is assigned 0.2, and the remaining probability is spread thinly across the other 29,998 words. If the original, unmasked word was soup, which of the following statements provides the most accurate analysis of this outcome?

Learn Before

Predicted Probability Distribution in MLM

Multiple Choice

A masked language model processes the input 'The chef carefully seasoned the [MASK] before serving.' For the masked position, the model generates a probability distribution over its entire 30,000-word vocabulary. The word 'soup' is assigned a probability of 0.6, 'dish' is assigned 0.2, and the remaining probability is spread thinly across the other 29,998 words. If the original, unmasked word was 'soup', which of the following statements provides the most accurate analysis of this outcome?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related