Constructing a 100% Masked Training Example
Consider the sentence: 'The quick brown fox jumps.' For a model being trained with an objective where 100% of the input tokens are masked, what would the input sequence fed to the model look like, and what would be the corresponding target output sequence the model is trained to generate? Assume the model uses [CLS] as a starting token and [MASK] for masked tokens.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Training the Decoder as a Language Model in 100% Masking Scenarios
A language model is trained using an objective where every token in the input sentence is replaced by a
[MASK]token. The model is then required to reconstruct the entire original sentence. How does the primary skill developed by this training method differ from a method where only a small fraction (e.g., 15%) of the tokens are masked?Constructing a 100% Masked Training Example
Evaluating a Model Training Strategy