Short Answer

Shift in Training Objective with 100% Masking

A model is being trained on a text corpus. In one training configuration, 15% of the words in each sentence are randomly replaced with a special [MASK] token, and the model must predict the original words. In a second configuration, 100% of the words are replaced with [MASK] tokens. Analyze how the fundamental task the model is learning to perform differs between these two configurations.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science