Learn Before
  • General Formulation of a Sequence Model

Comparison of Causal and Masked Language Modeling

Causal Language Modeling (CLM) and Masked Language Modeling (MLM) are two primary pre-training objectives for language models. The key difference lies in the context available for prediction. CLM is unidirectional (auto-regressive), predicting a token xix_i using only the preceding tokens x<ix_{<i}, as described in the text passage (e.g., Pr(x2e0,e1)\text{Pr}(x_2|\mathbf{e}_0, \mathbf{e}_1)). This is suitable for generative tasks. In contrast, MLM is bidirectional, predicting a masked token using both its left and right context, as shown in the image (e.g., predicting a masked x1x_1 using Pr(x1e0,e2,e4)\text{Pr}(x_1|\mathbf{e}_0, \mathbf{e}_2, \mathbf{e}_4)). This allows the model to build a deeper understanding of language, making it well-suited for tasks like question answering and sentiment analysis.

Image 0

0

1

14 days ago

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Output Variation in Sequence Models

  • Fundamental Issues in Sequence Model Formulation

  • Role of the [CLS] Token in Sequence Classification

  • Standard Auto-Regressive Probability Factorization

  • Masked Language Modeling

  • Comparison of Causal and Masked Language Modeling

  • Input Formatting with Separator Tokens