Concept

Loss Calculation for Encoder-Decoder Denoising Tasks

When training an encoder-decoder model on a denoising objective, the loss is calculated across the entire output sequence. The decoder generates the target sequence one token at a time. At each generation step, a loss function, typically cross-entropy, measures the discrepancy between the model's predicted probability distribution for the next token and the actual ground-truth token. The total loss for the training example is then computed by summing or averaging these individual token-level losses over the full length of the target sequence.

Image 0

0

1

Updated 2026-04-16

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences