1Cademy - Input Corruption Methods for Multi-Sentence Sequences

Learn Before

Input Corruption Methods for Denoising Autoencoder Training

Concept

Input Corruption Methods for Multi-Sentence Sequences

For text inputs that span multiple sentences, standard token-level corruption can be supplemented with sentence-level techniques. The BART model, for example, utilizes two such methods to corrupt multi-sentence documents.

Updated 2025-10-06

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn After

Sentence Reordering as an Input Corruption Method
Document Rotation as an Input Corruption Method
A research team is training a model on multi-paragraph documents. Their primary goal is to ensure the model learns the logical flow and coherence between sentences, not just the relationships between words within a single sentence. Which of the following input corruption strategies is specifically designed to target this higher-level, inter-sentence understanding?
Rationale for Sentence-Level Corruption
A language model is being trained using a denoising objective, where it learns to reconstruct original text from a corrupted version. Match each type of input corruption with the primary linguistic feature it forces the model to learn.

Learn Before

Related

Learn After