1Cademy - Comparison of Causal and Masked Language Modeling

Learn Before

General Formulation of a Sequence Model

Comparison of Causal and Masked Language Modeling

Causal Language Modeling (CLM) and Masked Language Modeling (MLM) are two primary pre-training objectives for language models. The key difference lies in the context available for prediction. CLM is unidirectional (auto-regressive), predicting a token $x_i$ using only the preceding tokens $x_{<i}$ , as described in the text passage (e.g., $\text{Pr}(x_2|\mathbf{e}_0, \mathbf{e}_1)$ ). This is suitable for generative tasks. In contrast, MLM is bidirectional, predicting a masked token using both its left and right context, as shown in the image (e.g., predicting a masked $x_1$ using $\text{Pr}(x_1|\mathbf{e}_0, \mathbf{e}_2, \mathbf{e}_4)$ ). This allows the model to build a deeper understanding of language, making it well-suited for tasks like question answering and sentiment analysis.

14 days ago

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related