1Cademy - BERT-style Masked Language Modeling

Learn Before

Masked Language Modeling (MLM)

Example

BERT-style Masked Language Modeling

BERT-style masked language modeling is a variant where individual, often non-contiguous, tokens in a sequence are masked or replaced with other words. For instance, given an input like [C] The kitten [M] playing the [M] ., the model is trained to predict the original tokens at the specific positions selected for corruption. As shown in Table 1, it reconstructs the individual tokens (such as kitten, is, chasing, and ball) at their respective positions rather than outputting a single concatenated phrase. This approach is typically applied to encoder-only or encoder-decoder models.

Updated 2026-04-17

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn Before

Related