The Discriminator in Replaced Token Detection
The discriminator is the primary Transformer encoder trained alongside the generator in the replaced token detection task. It takes the altered sequence produced by the generator as its input. The discriminator's goal is to examine each individual token in this sequence and perform a binary classification to determine if the token is identical to the original input or if it was altered. Specifically, it assigns a label of either or to each token.

0
1
References
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
The Generator in Replaced Token Detection
The Discriminator in Replaced Token Detection
Joint Training in Replaced Token Detection
Model Usage After Replaced Token Detection Training
Consider a pre-training method for a language model that uses two components. The first component, a 'generator', takes an original sentence and replaces a few words with other plausible words. The second component, a 'discriminator', then reads this modified sentence. The discriminator's task is to examine every single word in the modified sentence and decide for each one: 'Is this word from the original sentence, or is it a replacement?' What is the primary advantage of training the discriminator on this per-word classification task compared to a task where it only has to predict the original identity of the few words that were replaced?
Analyzing a Language Model's Training Step
A language model is being pre-trained using a method where it learns to distinguish original words from plausible replacements. Arrange the following steps of a single training iteration into the correct chronological order.
Learn After
Visual Example of Discriminator Operation in Replaced Token Detection
In a particular self-supervised learning setup, a 'generator' model first processes an input sentence and replaces some of its words with plausible alternatives. A second, more powerful 'discriminator' model then receives this altered sentence. The discriminator's task is to examine each word and determine if it is identical to the word in the original, unaltered sentence.
Consider this example:
- Original Sentence: "The scientist discovered the new element."
- Altered Sentence from Generator: "The scientist found the new element."
Given the discriminator's task, how should it classify the words 'found' and 'element' from the altered sentence?
In a replaced token detection task, a generator model is given the sentence 'The chef cooked the meal' with the word 'cooked' masked. The generator predicts the word 'cooked' to fill the mask. The discriminator then receives the sentence 'The chef cooked the meal'. According to the discriminator's objective, it should classify the word 'cooked' as 'replaced'.
Discriminator Performance Analysis