Shift in Training Objective with 100% Masking
A model is being trained on a text corpus. In one training configuration, 15% of the words in each sentence are randomly replaced with a special [MASK] token, and the model must predict the original words. In a second configuration, 100% of the words are replaced with [MASK] tokens. Analyze how the fundamental task the model is learning to perform differs between these two configurations.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Consider a text-infilling model that is typically trained by masking about 15% of the words in a sentence and having the model predict them based on the surrounding unmasked words. If this training process is modified to mask 100% of the words in every input sentence, what is the most significant change in the fundamental skill the model is being trained to perform?
Model Suitability for a Generation Task
Shift in Training Objective with 100% Masking