1Cademy - Loss Calculation for Encoder-Decoder Denoising Tasks

Input to Encoder: &quot;The scientist carefully [MASK] the solution into the beaker.&quot;
Target Output for Decoder: &quot;The scientist carefully poured the solution into the beaker.&quot;

Learn Before

Training Encoder-Decoder Models with a Denoising Autoencoding Objective

Concept

Loss Calculation for Encoder-Decoder Denoising Tasks

When training an encoder-decoder model on a denoising objective, the loss is calculated across the entire output sequence. The decoder generates the target sequence one token at a time. At each generation step, a loss function, typically cross-entropy, measures the discrepancy between the model's predicted probability distribution for the next token and the actual ground-truth token. The total loss for the training example is then computed by summing or averaging these individual token-level losses over the full length of the target sequence.