Concept

Training Objective of Masked Language Modeling (MLM)

Given an original text sequence x\mathbf{x} and its corrupted version xˉ\bar{\mathbf{x}}, optimizing a model to predict x\mathbf{x} based on xˉ\bar{\mathbf{x}} can be thought of as an autoencoding-like process. The fundamental training objective is to maximize the reconstruction probability Pr(xxˉ)\Pr(\mathbf{x}|\bar{\mathbf{x}}). However, because there is a simple position-wise alignment between the two sequences, an unmasked token in xˉ\bar{\mathbf{x}} is the exact same as the token in x\mathbf{x} at the same position. Since there is no need to consider the prediction for these unmasked tokens, the training objective is simplified to only maximize the probabilities for the masked tokens.

Image 0

0

1

Updated 2026-04-15

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences