1Cademy - Shift in Training Objective with 100% Masking

Learn Before

Training the Decoder as a Language Model in 100% Masking Scenarios

Short Answer

Shift in Training Objective with 100% Masking

A model is being trained on a text corpus. In one training configuration, 15% of the words in each sentence are randomly replaced with a special [MASK] token, and the model must predict the original words. In a second configuration, 100% of the words are replaced with [MASK] tokens. Analyze how the fundamental task the model is learning to perform differs between these two configurations.

Updated 2025-10-10

Contributors are:

Who are from:

Learn Before

Related