Fate of Models in a Two-Part Pre-training Scheme
Consider a pre-training method where a smaller 'generator' model corrupts a text by replacing some words, and a larger 'discriminator' model is trained to identify which words were replaced. After this training process is complete, describe the typical fate of both the generator and the discriminator models, and explain the reasoning behind these decisions.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning team has just finished pre-training a language model using a two-part system. The first, smaller model corrupted text by replacing some words with plausible alternatives. The second, larger model was then trained to identify which words in the text were original and which were replacements. The team's ultimate goal is to use this work to build a system for classifying the sentiment of customer reviews. What is the most effective and standard next step for the team to take?
A machine learning team has just finished pre-training a language model using a two-part system. The first, smaller model corrupted text by replacing some words with plausible alternatives. The second, larger model was then trained to identify which words in the text were original and which were replacements. The team's ultimate goal is to use this work to build a system for classifying the sentiment of customer reviews. What is the most effective and standard next step for the team to take?
Fate of Models in a Two-Part Pre-training Scheme