1Cademy - Encoder Processing of a Corrupted Sequence in MLM

Learn Before

Illustrative Example of BERT's MLM Pre-training Pipeline

Example

Encoder Processing of a Corrupted Sequence in MLM

After a sequence is corrupted for Masked Language Modeling (MLM), such as [CLS] It is [MASK] . [SEP] I need [MASK] hat . [SEP], it is passed to the Transformer encoder for training. Each token in the modified sequence is first converted into an input embedding (e.g., e0, e1, ... e11). The encoder then processes this sequence of embeddings to produce a sequence of contextualized hidden states (e.g., h0, h1, ... h11). The model is then trained to use these hidden states to predict the original tokens that were altered (e.g., 'raining', 'an', 'umbrella').

0

1

Updated 2025-10-09

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After