1Cademy - A masked language model is given the input sequence: The quick brown [MASK] jumps over the lazy dog. The original, unmasked token at the `[MASK]` position was fox. Two different versions of the model, Model A and Model B, are used to predict the masked token. - Model A assigns a probability of 0.85 to the token fox. - Model B assigns a probability of 0.15 to the token fox, and its highest predicted probability is 0.40 for the token cat. Based on the probability assigned to the *corre

Learn Before

Probability of a True Token in MLM

Multiple Choice

A masked language model is given the input sequence: 'The quick brown [MASK] jumps over the lazy dog.' The original, unmasked token at the [MASK] position was 'fox'. Two different versions of the model, Model A and Model B, are used to predict the masked token.

Model A assigns a probability of 0.85 to the token 'fox'.
Model B assigns a probability of 0.15 to the token 'fox', and its highest predicted probability is 0.40 for the token 'cat'.

Based on the probability assigned to the *corre

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course