Training Objective of Masked Language Modeling (MLM)
Given an original text sequence and its corrupted version , optimizing a model to predict based on can be thought of as an autoencoding-like process. The fundamental training objective is to maximize the reconstruction probability . However, because there is a simple position-wise alignment between the two sequences, an unmasked token in is the exact same as the token in at the same position. Since there is no need to consider the prediction for these unmasked tokens, the training objective is simplified to only maximize the probabilities for the masked tokens.

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Comparison of Masked vs. Causal Language Modeling
Formal Definition of the Masking Process in MLM
Example of Masked Language Modeling with Single and Multiple Masks
Training Objective of Masked Language Modeling (MLM)
Drawback of Masked Language Modeling: The [MASK] Token Discrepancy
Limitation of MLM: Ignoring Dependencies Between Masked Tokens
The Generator in Replaced Token Detection
Consecutive Token Masking in MLM
Token Selection and Modification Strategy in BERT's MLM
BERT's Masked Language Modeling Pre-training Pipeline
Performance Degradation and Early Stopping in Pre-training
Flexibility of Masked Language Modeling for Encoder-Decoder Training
Training Objective of the Standard BERT Model
During a self-supervised pre-training process, a model is given an input sequence where one word has been replaced by a special symbol, for example: 'The quick brown [MASK] jumps over the lazy dog.' The model's objective is to predict the original word, 'fox'. Which of the following is the direct input used by the final output layer to make this specific prediction?
Original Sequence for Masking and Deletion Examples
Arrange the following steps in the correct order to describe the process of pre-training an encoder model using a masked language modeling objective.
Evaluating a Pre-training Strategy for a Specific Application
Learn After
MLM Training Objective using Cross-Entropy Loss
MLM Training Objective as Maximum Likelihood Estimation
A language model is being trained using a masked language modeling objective. The input is a sentence where some words have been replaced with a
[MASK]token. While the high-level goal is to enable the model to reconstruct the original sentence from this corrupted input, the practical training objective is more specific. Which statement best analyzes the actual, simplified objective the model optimizes during training and the reason for this simplification?Evaluating an MLM Training Implementation
During the training of a language model with a masked language modeling objective, the model is optimized to predict the entire original text sequence, including the tokens that were not masked, from the corrupted input.