Token Reordering as an Input Corruption Method
Token reordering, also known as token shuffling, is an input corruption technique used in the training of denoising autoencoders. This method involves modifying the original input by rearranging the order of its tokens, compelling the model to learn about the underlying semantic content independent of token position.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Token Masking as an Input Corruption Method
Token Deletion as an Input Corruption Method
Combining Multiple Corruption Methods in Pre-training
Selecting Appropriate Input Corruption Methods
Token Alteration as an Input Corruption Method
Token Reordering as an Input Corruption Method
Input Corruption Methods for Multi-Sentence Sequences
Input Corruption Methods for Multi-Sentence Sequences
Corruption Methods for Multi-Sentence Sequences
A research team is pre-training an encoder-decoder model using a denoising objective. Their primary goal is to create a model that excels at summarizing long documents, which requires a deep understanding of the text's overall semantic content and logical flow, rather than its exact word-for-word structure. Which of the following input corruption strategies would be most aligned with this specific goal?
You are training an encoder-decoder model with a denoising objective. Match each input corruption method with the primary linguistic capability it is designed to teach the model.
Diagnosing Pre-training Deficiencies
Learn After
An engineer is training a model whose task is to reconstruct an original sentence from a modified version of it. The engineer's primary goal is to force the model to learn the semantic meaning of the sentence, independent of the specific ordering of its words. Which of the following modification techniques, when applied to the input sentence, would be most effective for achieving this specific training objective?
Comparing Input Alteration Techniques
Evaluating a Training Strategy for a Summarization Model
Example of Token Reordering in Denoising Autoencoding