Example

Example of Full Sequence Generation via 100% Masking

An extreme application of Masked Language Modeling (MLM) involves masking 100% of the tokens in a sequence, which effectively transforms the training objective into a sequence generation task. In this scenario, an input consisting of a [CLS] token followed by [MASK] tokens for every word is used to train the model to generate the complete, original sentence. For example: [CLS] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] → ⟨s⟩ The puppies are frolicking outside the house .

0

1

Updated 2026-04-16

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences