Example of Full Sequence Generation via 100% Masking
An extreme application of Masked Language Modeling (MLM) involves masking 100% of the tokens in a sequence, which effectively transforms the training objective into a sequence generation task. In this scenario, an input consisting of a [CLS] token followed by [MASK] tokens for every word is used to train the model to generate the complete, original sentence. For example: [CLS] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] → ⟨s⟩ The puppies are frolicking outside the house .
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of Full Sequence Generation via 100% Masking
A research team is pre-training two separate encoder-decoder models using different variations of a masked language modeling objective.
- Model A is trained by masking 15% of the input tokens, with each mask covering only a single token. The model's objective is to predict the original token for each masked position.
- Model B is trained by masking 50% of the input tokens, with masks covering contiguous spans of up to 10 tokens. The model's objective is to predict the entire original text span.
Which of the following statements most accurately analyzes the likely capabilities these two models will develop based on their pre-training objectives?
Evaluating Pre-training Objectives for a Multi-Task Model
Match each masked language modeling (MLM) pre-training strategy for an encoder-decoder model with the primary capability it is designed to develop.
Learn After
Training the Decoder as a Language Model in 100% Masking Scenarios
A language model is trained using an objective where every token in the input sentence is replaced by a
[MASK]token. The model is then required to reconstruct the entire original sentence. How does the primary skill developed by this training method differ from a method where only a small fraction (e.g., 15%) of the tokens are masked?Constructing a 100% Masked Training Example
Evaluating a Model Training Strategy