1Cademy - Flexibility of Masked Language Modeling for Encoder-Decoder Training

Learn Before

Training Encoder-Decoder Models with a Denoising Autoencoding Objective
Self-Supervised Pre-training of Encoders via Masked Language Modeling

Concept

Flexibility of Masked Language Modeling for Encoder-Decoder Training

The Masked Language Modeling (MLM) framework offers significant flexibility for training encoder-decoder models. Different training objectives can be created by adjusting various parameters, such as the percentage of tokens that are masked and the maximum length of the text spans that are replaced by a mask token. This adaptability allows the training to range from a BERT-style objective with partial masking to a full language modeling task where the entire sequence is generated.

Updated 2026-04-16

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Example of Full Sequence Generation via 100% Masking
A research team is pre-training two separate encoder-decoder models using different variations of a masked language modeling objective.
- Model A is trained by masking 15% of the input tokens, with each mask covering only a single token. The model's objective is to predict the original token for each masked position.
- Model B is trained by masking 50% of the input tokens, with masks covering contiguous spans of up to 10 tokens. The model's objective is to predict the entire origina
Evaluating Pre-training Objectives for a Multi-Task Model
Match each masked language modeling (MLM) pre-training strategy for an encoder-decoder model with the primary capability it is designed to develop.

Learn Before

Related

Learn After