Example

Example of Masked Language Modeling with Single and Multiple Masks

To illustrate Masked Language Modeling, consider the sentence 'The early bird catches the worm'. A training instance is created by masking one or more tokens. For example, masking a single token might yield 'The early bird [MASK][\mathrm{MASK}] the worm', where the model predicts 'catches'. Alternatively, masking multiple tokens, such as 'early' and 'worm' (e.g., at indices i1=2{}i_1 = 2 and i2=6{}i_2 = 6), yields the corrupted sequence 'The [MASK][\mathrm{MASK}] bird catches the [MASK][\mathrm{MASK}]'. The model must then correctly predict the masked words based on context.

0

1

Updated 2026-04-15

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Related