The Generator in Replaced Token Detection
In the Replaced Token Detection framework, the generator is a small masked language model tasked with creating a corrupted version of the original input text. Its process involves two main steps: first, it randomly masks a subset of tokens in a sequence. Second, it is trained to predict the original tokens for these masked positions. The generator then outputs a new sequence where the masked tokens have been replaced by its predictions, which may or may not match the original tokens. This altered sequence is then passed to the discriminator.

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
The Generator in Replaced Token Detection
The Discriminator in Replaced Token Detection
Joint Training in Replaced Token Detection
Model Usage After Replaced Token Detection Training
Consider a pre-training method for a language model that uses two components. The first component, a 'generator', takes an original sentence and replaces a few words with other plausible words. The second component, a 'discriminator', then reads this modified sentence. The discriminator's task is to examine every single word in the modified sentence and decide for each one: 'Is this word from the original sentence, or is it a replacement?' What is the primary advantage of training the discriminator on this per-word classification task compared to a task where it only has to predict the original identity of the few words that were replaced?
Analyzing a Language Model's Training Step
A language model is being pre-trained using a method where it learns to distinguish original words from plausible replacements. Arrange the following steps of a single training iteration into the correct chronological order.
Comparison of Masked vs. Causal Language Modeling
Formal Definition of the Masking Process in MLM
Example of Masked Language Modeling with Single and Multiple Masks
Training Objective of Masked Language Modeling (MLM)
Drawback of Masked Language Modeling: The [MASK] Token Discrepancy
Limitation of MLM: Ignoring Dependencies Between Masked Tokens
The Generator in Replaced Token Detection
Consecutive Token Masking in MLM
Token Selection and Modification Strategy in BERT's MLM
BERT's Masked Language Modeling Pre-training Pipeline
Performance Degradation and Early Stopping in Pre-training
Flexibility of Masked Language Modeling for Encoder-Decoder Training
Training Objective of the Standard BERT Model
During a self-supervised pre-training process, a model is given an input sequence where one word has been replaced by a special symbol, for example: 'The quick brown [MASK] jumps over the lazy dog.' The model's objective is to predict the original word, 'fox'. Which of the following is the direct input used by the final output layer to make this specific prediction?
Original Sequence for Masking and Deletion Examples
Arrange the following steps in the correct order to describe the process of pre-training an encoder model using a masked language modeling objective.
Evaluating a Pre-training Strategy for a Specific Application
Learn After
In a two-model pre-training setup, a small 'generator' model first processes an input sentence by masking some words and then filling those masked positions with its own predictions. The resulting, potentially altered, sentence is then passed to a larger 'discriminator' model. What is the most critical function of the generator's output in this process?
Evaluating Corrupted Text for Model Training
A small masked language model is used to create a corrupted version of an input text sequence for a subsequent training task. Arrange the steps this model takes to generate the final corrupted sequence in the correct chronological order.
Visual Example of Generator Operation in Replaced Token Detection