Activity (Process)

BERT's Masked Language Modeling Pre-training Pipeline

The pre-training pipeline for BERT's Masked Language Modeling (MLM) is a multi-step process. It begins with an input sequence, from which 15% of tokens are randomly selected. These chosen tokens are then altered: 80% are masked, 10% are replaced by random tokens, and 10% are left unchanged. This modified sequence is converted into embeddings and processed by a Transformer Encoder to produce contextualized hidden states. Finally, the model is trained using these hidden states to predict the original values of the altered tokens.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences