Learn Before
  • Masked Language Modeling (MLM) as a Pre-training Task

BERT's Masked Language Modeling Pre-training Pipeline

The pre-training pipeline for BERT's Masked Language Modeling (MLM) is a multi-step process. It begins with an input sequence, from which 15% of tokens are randomly selected. These chosen tokens are then altered: 80% are masked, 10% are replaced by random tokens, and 10% are left unchanged. This modified sequence is converted into embeddings and processed by a Transformer Encoder to produce contextualized hidden states. Finally, the model is trained using these hidden states to predict the original values of the altered tokens.

0

1

11 days ago

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Comparison of Masked vs. Causal Language Modeling

  • Formal Definition of the Masking Process in MLM

  • Example of Masked Language Modeling with Single and Multiple Masks

  • Training Objective of Masked Language Modeling (MLM)

  • Drawback of Masked Language Modeling: The [MASK] Token Discrepancy

  • Limitation of MLM: Ignoring Dependencies Between Masked Tokens

  • The Generator in Replaced Token Detection

  • Example Sentence for Masking and Reconstruction Task

  • Generalization of MLM via Masking Percentage

  • Consecutive Token Masking in MLM

  • BERT's Training Objective and Innovations

  • Token Selection and Modification Strategy in BERT's MLM

  • BERT's Masked Language Modeling Pre-training Pipeline

  • Performance Degradation and Early Stopping in Pre-training