1Cademy - Example of Masked Language Modeling with Single and Multiple Masks

Learn Before

Self-Supervised Pre-training of Encoders via Masked Language Modeling
Formal Definition of the Masking Process in MLM

Example

Example of Masked Language Modeling with Single and Multiple Masks

To illustrate Masked Language Modeling, consider the sentence 'The early bird catches the worm'. A training instance is created by masking one or more tokens. For example, masking a single token might yield 'The early bird $[\mathrm{MASK}]$ the worm', where the model predicts 'catches'. Alternatively, masking multiple tokens, such as 'early' and 'worm' (e.g., at indices ${}i_1 = 2$ and ${}i_2 = 6$ ), yields the corrupted sequence 'The $[\mathrm{MASK}]$ bird catches the $[\mathrm{MASK}]$ '. The model must then correctly predict the masked words based on context.