Learn Before
Masked Language Modeling (MLM) as a Pre-training Task
BERT's Masked Language Modeling Pre-training Pipeline
The pre-training pipeline for BERT's Masked Language Modeling (MLM) is a multi-step process. It begins with an input sequence, from which 15% of tokens are randomly selected. These chosen tokens are then altered: 80% are masked, 10% are replaced by random tokens, and 10% are left unchanged. This modified sequence is converted into embeddings and processed by a Transformer Encoder to produce contextualized hidden states. Finally, the model is trained using these hidden states to predict the original values of the altered tokens.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Comparison of Masked vs. Causal Language Modeling
Formal Definition of the Masking Process in MLM
Example of Masked Language Modeling with Single and Multiple Masks
Training Objective of Masked Language Modeling (MLM)
Drawback of Masked Language Modeling: The [MASK] Token Discrepancy
Limitation of MLM: Ignoring Dependencies Between Masked Tokens
The Generator in Replaced Token Detection
Example Sentence for Masking and Reconstruction Task
Generalization of MLM via Masking Percentage
Consecutive Token Masking in MLM
BERT's Training Objective and Innovations
Token Selection and Modification Strategy in BERT's MLM
BERT's Masked Language Modeling Pre-training Pipeline
Performance Degradation and Early Stopping in Pre-training