Multiple Choice

A masked language model processes the input 'The chef carefully seasoned the [MASK] before serving.' For the masked position, the model generates a probability distribution over its entire 30,000-word vocabulary. The word 'soup' is assigned a probability of 0.6, 'dish' is assigned 0.2, and the remaining probability is spread thinly across the other 29,998 words. If the original, unmasked word was 'soup', which of the following statements provides the most accurate analysis of this outcome?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science