Example

Example of BERT-style Input for Masked Language Modeling

To illustrate a BERT-style input and target output format for Masked Language Modeling, consider the sentence:

"The puppies are frolicking outside the house."

By masking two tokens, such as "frolicking" and "the", the model's input becomes:

[CLS] The puppies are [MASK] outside [MASK] house.

The corresponding target output starts with a sequence start token <s> and contains the predicted original words only at the masked positions, while leaving blanks for the unmasked tokens:

<s> ___ ___ ___ frolicking ___ the ___ ___
Image 0

0

1

Updated 2026-04-16

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences