Learn Before
Example

BERT-style Masked Language Modeling

BERT-style masked language modeling is a variant where individual, often non-contiguous, tokens in a sequence are masked or replaced with other words. For instance, given an input like [C] The kitten [M] playing the [M] ., the model is trained to predict the original tokens at the specific positions selected for corruption. As shown in Table 1, it reconstructs the individual tokens (such as kitten, is, chasing, and ball) at their respective positions rather than outputting a single concatenated phrase. This approach is typically applied to encoder-only or encoder-decoder models.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences