Case Study

Calculating Contribution to MLM Training Objective

A language model is being trained on the original sentence 'The quick brown fox.' During one training step, the model receives the masked input 'The quick [MASK] fox.' and produces the following probability distribution for the masked position:

  • P('brown' | 'The quick [MASK] fox') = 0.7
  • P('red' | 'The quick [MASK] fox') = 0.2
  • P('lazy' | 'The quick [MASK] fox') = 0.1

Based on the maximum likelihood estimation objective used in this type of training, calculate the specific log-probability value that this single masked token contributes to the total objective function. Explain the significance of this calculated value for the model's training process.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science