Comparison

Comparison of Context Usage in Causal vs. Masked Language Modeling

Causal Language Modeling (CLM) and Masked Language Modeling (MLM) differ fundamentally in how they use context for token prediction. CLM is unidirectional, meaning it only uses the left-context (preceding tokens) to predict the token at a given position. In contrast, MLM is bidirectional, as it utilizes all unmasked tokens—both to the left and right of a masked token—to make its prediction. This bidirectional approach allows MLM-based models to build a more comprehensive contextual understanding of language.

Image 0

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences