Multiple Choice

During a specific language model pre-training procedure, 15% of tokens in an input sequence are chosen for prediction. Of these chosen tokens, 80% are replaced by a special [MASK] symbol, 10% are replaced by a random token from the vocabulary, and 10% remain unchanged. What is the primary analytical reason for including the steps where tokens are replaced by a random one or left unchanged, instead of simply replacing all 100% of the chosen tokens with the [MASK] symbol?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science