1Cademy - Training the Decoder as a Language Model in 100% Masking Scenarios

Learn Before

Example of Full Sequence Generation via 100% Masking

Activity (Process)

Training the Decoder as a Language Model in 100% Masking Scenarios

In the specific case of Masked Language Modeling where 100% of the input tokens are masked, the training objective becomes equivalent to sequence generation. Consequently, the decoder is trained to operate as a language model, responsible for generating the entire original text.

Updated 2026-04-16

Contributors are: