Learn Before
Concept

Generalization of Masked Language Modeling to Autoregressive Modeling

The masked language modeling approach can be generalized to encompass both BERT-style bidirectional training and standard autoregressive language modeling. By varying the percentage of masked tokens in the input text, the objective can shift. For instance, if all tokens in a sequence are masked, the model's task becomes generating the entire sequence from scratch, effectively mirroring standard language modeling.

0

1

Updated 2026-04-16

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences