1Cademy - Comparison of Masked vs. Causal Language Modeling

How it works Research Communities Benefits About Us

Learn Before

Masked Language Modeling (MLM) as a Pre-training Task

Comparison of Masked vs. Causal Language Modeling

Causal Language Modeling (CLM), also known as conventional language modeling, can be viewed as a specific type of Masked Language Modeling (MLM). In CLM, the prediction of a token at a given position is based only on its preceding tokens (the left-context), effectively masking the entire right-context, which makes it a unidirectional process. In contrast, the general form of MLM is bidirectional, as it utilizes all unmasked tokens—from both the left and right contexts—to predict a masked token within a sequence.

0

1

13 days ago

Contributors are:

Gemini AI

Who are from:

Google

References

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related

Comparison of Masked vs. Causal Language Modeling
Formal Definition of the Masking Process in MLM
Example of Masked Language Modeling with Single and Multiple Masks
Training Objective of Masked Language Modeling (MLM)
Drawback of Masked Language Modeling: The [MASK] Token Discrepancy
Limitation of MLM: Ignoring Dependencies Between Masked Tokens
The Generator in Replaced Token Detection
Example Sentence for Masking and Reconstruction Task
Generalization of MLM via Masking Percentage
Consecutive Token Masking in MLM
BERT's Training Objective and Innovations
Token Selection and Modification Strategy in BERT's MLM
BERT's Masked Language Modeling Pre-training Pipeline
Performance Degradation and Early Stopping in Pre-training