Learn Before
Training Encoder-Decoder Models with a Denoising Autoencoding Objective
The denoising autoencoding objective is utilized to train encoder-decoder models by requiring them to reconstruct an original, uncorrupted sequence from a corrupted input. Operating similarly to a denoising autoencoder, the encoder processes the corrupted input—such as a sequence with masked tokens—and transforms it into a hidden representation. The decoder then uses this hidden representation to predict the original text. By learning to map a corrupted sequence to its uncorrupted counterpart, the model concurrently develops two key skills: the encoder gains the ability to comprehend the input context, while the decoder acquires the capability to generate coherent text.

0
1
References
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Introduction of Denoising Autoencoders
Vector Field of Denoising Autoencoders
History of MLPs for Denoising Dates
Training Encoder-Decoder Models with a Denoising Autoencoding Objective
An engineer trains two autoencoder models on a large dataset of clean, high-resolution images. Model A is a standard autoencoder, trained to reconstruct the original images perfectly. Model B is a denoising autoencoder, trained to reconstruct the original clean images from input images that have been intentionally corrupted with random noise (e.g., salt-and-pepper noise). After training, both models are evaluated on their ability to reconstruct a new set of images that have a different, unseen type of corruption (e.g., a slight blur). Based on their training objectives, which model is expected to perform better on this new task, and why?
A key modification to the standard autoencoder training process is the introduction of a 'corruption' step to create a more robust model. Arrange the following steps to accurately describe a single training iteration for this modified approach, which aims to reconstruct an original data point from a noisy version of it.
An autoencoder model is trained on a large dataset of facial images. During each training step, a clean image (
x) is taken, a random rectangular section of it is completely blacked out to create a corrupted version (~x), and the model is tasked with reconstructing the original, clean image (x) from the corrupted input (~x). Which of the following best explains what the model must learn about the data distribution to succeed at this specific task?Training Encoder-Decoder Models with a Denoising Autoencoding Objective
A research team is pre-training a language model with the specific goal of making it highly proficient at understanding long-range contextual relationships and the logical flow of arguments within a paragraph. They use a method where the model learns to restore an original, clean text from a deliberately corrupted version. Which of the following corruption strategies applied to the input text would be most effective for achieving the team's specific goal?
Designing a Robust Text Correction Model
Analyzing the Impact of Input Corruption
Example of Span Masking in Denoising Autoencoding
Example of Sentinel Masking in Denoising Autoencoding
Learn After
Example of Denoising Task with Consecutive Token Masking
Span-Based Denoising as an Encoder-Decoder Training Objective
Input Corruption Methods for Denoising Autoencoder Training
Denoising Autoencoder Training Objective
Loss Calculation for Encoder-Decoder Denoising Tasks
Training Efficiency in Denoising Autoencoding
Flexibility of Masked Language Modeling for Encoder-Decoder Training
Example of a Denoising Autoencoder Task for Encoder-Decoder Models
BART Model's Use of Diverse Input Corruption Methods
An encoder-decoder model is being trained using the following example:
- Input to Encoder: "The scientist carefully [MASK] the solution into the beaker."
- Target Output for Decoder: "The scientist carefully poured the solution into the beaker."
Based on this training setup, what is the primary function of the decoder?
Evaluating a Model Training Objective
An encoder-decoder model is being trained with the objective of reconstructing a full, original sentence from an input version where several random words have been removed. What is the most critical function of the encoder's output in this specific training paradigm?
Corrupted Input for Encoder-Decoder Pre-training
Diagrammatic Example of an Encoder-Decoder Model Trained with a Denoising Autoencoding Objective