1Cademy - Example of BERT-style Input for Masked Language Modeling

Learn Before

Original Sequence for Masking and Deletion Examples

Example

Example of BERT-style Input for Masked Language Modeling

To illustrate a BERT-style input and target output format for Masked Language Modeling, consider the sentence:

"The puppies are frolicking outside the house."

By masking two tokens, such as "frolicking" and "the", the model's input becomes:

[CLS] The puppies are [MASK] outside [MASK] house.

The corresponding target output starts with a sequence start token <s> and contains the predicted original words only at the masked positions, while leaving blanks for the unmasked tokens:

<s> ___ ___ ___ frolicking ___ the ___ ___

Updated 2026-04-16

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Formatting a Sentence for Model Training
Analyzing a Model Input Format

Learn Before

Related

Learn After