Activity (Process)

Training the Decoder as a Language Model in 100% Masking Scenarios

In the specific case of Masked Language Modeling where 100% of the input tokens are masked, the training objective becomes equivalent to sequence generation. Consequently, the decoder is trained to operate as a language model, responsible for generating the entire original text.

0

1

Updated 2026-04-16

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences