Learn Before
Calculating Token Modifications in Pre-training
During a language model's pre-training phase, 15% of tokens in each sequence are selected for a prediction task. Of these selected tokens, 10% are left in their original form. If a given input sequence contains 4,000 tokens, how many tokens would you expect to be selected for prediction but remain unchanged in the input? Provide only the final numerical answer.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of an Unchanged Token in a BERT Input Sequence
A language model is pre-trained using a method where 15% of the words in an input sentence are selected for prediction. Of these selected words, a small fraction (10%) are intentionally left in their original form, while the model is still tasked with predicting them based on the surrounding context. What is the most significant reason for this strategy of leaving some target words unchanged?
Calculating Token Modifications in Pre-training
Critique of a Modified Pre-training Strategy
Purpose of Unchanged Tokens in BERT's MLM Strategy