Loss Calculation for Encoder-Decoder Denoising Tasks
When training an encoder-decoder model on a denoising objective, the loss is calculated across the entire output sequence. The decoder generates the target sequence one token at a time. At each generation step, a loss function, typically cross-entropy, measures the discrepancy between the model's predicted probability distribution for the next token and the actual ground-truth token. The total loss for the training example is then computed by summing or averaging these individual token-level losses over the full length of the target sequence.

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of Denoising Task with Consecutive Token Masking
Span-Based Denoising as an Encoder-Decoder Training Objective
Input Corruption Methods for Denoising Autoencoder Training
Denoising Autoencoder Training Objective
Loss Calculation for Encoder-Decoder Denoising Tasks
Training Efficiency in Denoising Autoencoding
Flexibility of Masked Language Modeling for Encoder-Decoder Training
Example of a Denoising Autoencoder Task for Encoder-Decoder Models
BART Model's Use of Diverse Input Corruption Methods
An encoder-decoder model is being trained using the following example:
- Input to Encoder: "The scientist carefully [MASK] the solution into the beaker."
- Target Output for Decoder: "The scientist carefully poured the solution into the beaker."
Based on this training setup, what is the primary function of the decoder?
Evaluating a Model Training Objective
An encoder-decoder model is being trained with the objective of reconstructing a full, original sentence from an input version where several random words have been removed. What is the most critical function of the encoder's output in this specific training paradigm?
Corrupted Input for Encoder-Decoder Pre-training
Diagrammatic Example of an Encoder-Decoder Model Trained with a Denoising Autoencoding Objective
Learn After
An encoder-decoder model is being trained on a denoising task. Its goal is to reconstruct an original sentence from a corrupted version. During one training step, the model must generate the target sentence: 'The quick brown fox jumps.' The model generates the following output, one word at a time: 'The quick brown foxx jumps.' Based on how the training loss is typically computed for this type of task, which statement best describes how the error signal is calculated?
You are training an encoder-decoder model on a denoising task. For a single training example, arrange the following steps in the correct order to describe how the total loss is calculated for the target output sequence.
Analyzing Training Loss in a Sequence Generation Task