Multiple Choice

In a language model's pre-training, a portion of input tokens selected for prediction are substituted with a completely random token from the vocabulary, rather than always using a special placeholder like [MASK]. What is the primary analytical justification for this specific strategy?

0

1

Updated 2025-09-29

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science