Learn Before
Consider a standard pre-training procedure for a language model where 15% of all tokens in an input are first selected for prediction. Of these selected tokens, 80% are then replaced with a special [MASK] symbol. Based on this procedure, it is guaranteed that for any given input sequence of 1,000 tokens, exactly 120 tokens will be replaced with the [MASK] symbol.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of Token Masking in a BERT Input Sequence
During a language model's pre-training, a specific strategy is used to alter words that have been chosen for the model to predict. If 10,000 words in a dataset have been chosen for this prediction task, and the strategy dictates that 80% of these chosen words are replaced with a special placeholder symbol, approximately how many of the 10,000 chosen words will be replaced by this symbol?
Verifying a Language Model's Pre-training Data
Consider a standard pre-training procedure for a language model where 15% of all tokens in an input are first selected for prediction. Of these selected tokens, 80% are then replaced with a special
[MASK]symbol. Based on this procedure, it is guaranteed that for any given input sequence of 1,000 tokens, exactly 120 tokens will be replaced with the[MASK]symbol.