1Cademy - Flexibility of Masked Language Modeling for Encoder-Decoder Training

Input to Encoder: &quot;The scientist carefully [MASK] the solution into the beaker.&quot;
Target Output for Decoder: &quot;The scientist carefully poured the solution into the beaker.&quot;

Learn Before

Training Encoder-Decoder Models with a Denoising Autoencoding Objective
Self-Supervised Pre-training of Encoders via Masked Language Modeling

Concept

Flexibility of Masked Language Modeling for Encoder-Decoder Training

The Masked Language Modeling (MLM) framework offers significant flexibility for training encoder-decoder models. Different training objectives can be created by adjusting various parameters, such as the percentage of tokens that are masked and the maximum length of the text spans that are replaced by a mask token. This adaptability allows the training to range from a BERT-style objective with partial masking to a full language modeling task where the entire sequence is generated.

Updated 2026-04-16

Contributors are: