Learn Before
In a language model's pre-training, a portion of input tokens selected for prediction are substituted with a completely random token from the vocabulary, rather than always using a special placeholder like [MASK]. What is the primary analytical justification for this specific strategy?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of Random Token Replacement in a BERT Input Sequence
In a language model's pre-training, a portion of input tokens selected for prediction are substituted with a completely random token from the vocabulary, rather than always using a special placeholder like
[MASK]. What is the primary analytical justification for this specific strategy?Predicting from Corrupted Input
A language model's pre-training process involves selecting a subset of tokens in an input sequence for prediction. One modification technique applied to these selected tokens is to substitute them with a completely random token from the model's vocabulary. Given the original sequence:
The cat sat on the mat.If the tokensatis chosen for this specific random replacement technique, which of the following is a valid resulting sequence?