Learn Before
  • Masked Language Modeling (MLM) as a Pre-training Task

Comparison of Masked vs. Causal Language Modeling

Causal Language Modeling (CLM), also known as conventional language modeling, can be viewed as a specific type of Masked Language Modeling (MLM). In CLM, the prediction of a token at a given position is based only on its preceding tokens (the left-context), effectively masking the entire right-context, which makes it a unidirectional process. In contrast, the general form of MLM is bidirectional, as it utilizes all unmasked tokens—from both the left and right contexts—to predict a masked token within a sequence.

0

1

13 days ago

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Comparison of Masked vs. Causal Language Modeling

  • Formal Definition of the Masking Process in MLM

  • Example of Masked Language Modeling with Single and Multiple Masks

  • Training Objective of Masked Language Modeling (MLM)

  • Drawback of Masked Language Modeling: The [MASK] Token Discrepancy

  • Limitation of MLM: Ignoring Dependencies Between Masked Tokens

  • The Generator in Replaced Token Detection

  • Example Sentence for Masking and Reconstruction Task

  • Generalization of MLM via Masking Percentage

  • Consecutive Token Masking in MLM

  • BERT's Training Objective and Innovations

  • Token Selection and Modification Strategy in BERT's MLM

  • BERT's Masked Language Modeling Pre-training Pipeline

  • Performance Degradation and Early Stopping in Pre-training