Model Usage After Replaced Token Detection Training
Once the training for Replaced Token Detection is finished, the two models involved have different fates. The generator, having served its purpose of creating a challenging training task, is discarded. The discriminator's encoder, which has learned rich contextual representations, is preserved and used as the pre-trained model for various downstream natural language understanding tasks.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
The Generator in Replaced Token Detection
The Discriminator in Replaced Token Detection
Joint Training in Replaced Token Detection
Model Usage After Replaced Token Detection Training
Consider a pre-training method for a language model that uses two components. The first component, a 'generator', takes an original sentence and replaces a few words with other plausible words. The second component, a 'discriminator', then reads this modified sentence. The discriminator's task is to examine every single word in the modified sentence and decide for each one: 'Is this word from the original sentence, or is it a replacement?' What is the primary advantage of training the discriminator on this per-word classification task compared to a task where it only has to predict the original identity of the few words that were replaced?
Analyzing a Language Model's Training Step
A language model is being pre-trained using a method where it learns to distinguish original words from plausible replacements. Arrange the following steps of a single training iteration into the correct chronological order.
Learn After
A machine learning team has just finished pre-training a language model using a two-part system. The first, smaller model corrupted text by replacing some words with plausible alternatives. The second, larger model was then trained to identify which words in the text were original and which were replacements. The team's ultimate goal is to use this work to build a system for classifying the sentiment of customer reviews. What is the most effective and standard next step for the team to take?
A machine learning team has just finished pre-training a language model using a two-part system. The first, smaller model corrupted text by replacing some words with plausible alternatives. The second, larger model was then trained to identify which words in the text were original and which were replacements. The team's ultimate goal is to use this work to build a system for classifying the sentiment of customer reviews. What is the most effective and standard next step for the team to take?
Fate of Models in a Two-Part Pre-training Scheme