1Cademy - A language model is being trained with the objective of maximizing the log-probability of the original tokens at masked positions. For the original sentence The fox jumps over the dog, the model is given the masked input The fox [MASK] over the dog. Which of the following model predictions for the `[MASK]` token would contribute the most to achieving the training objective for this specific instance?

Learn Before

MLM Training Objective as Maximum Likelihood Estimation

Multiple Choice

A language model is being trained with the objective of maximizing the log-probability of the original tokens at masked positions. For the original sentence 'The fox jumps over the dog', the model is given the masked input 'The fox [MASK] over the dog'. Which of the following model predictions for the [MASK] token would contribute the most to achieving the training objective for this specific instance?

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related