Example

Encoder Processing of a Corrupted Sequence in MLM

After a sequence is corrupted for Masked Language Modeling (MLM), such as [CLS] It is [MASK] . [SEP] I need [MASK] hat . [SEP], it is passed to the Transformer encoder for training. Each token in the modified sequence is first converted into an input embedding (e.g., e0, e1, ... e11). The encoder then processes this sequence of embeddings to produce a sequence of contextualized hidden states (e.g., h0, h1, ... h11). The model is then trained to use these hidden states to predict the original tokens that were altered (e.g., 'raining', 'an', 'umbrella').

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences