1Cademy - An encoder-decoder model is trained on a denoising task. It receives a corrupted input like `The quick [M] fox jumps [M] the lazy dog.` and must generate the original, complete sentence `The quick brown fox jumps over the lazy dog.` The decoder generates the output one word at a time. Why is the training loss typically calculated for each word the decoder generates, rather than just a single loss for the entire completed sentence?

Learn Before

Diagrammatic Example of an Encoder-Decoder Model Trained with a Denoising Autoencoding Objective

Multiple Choice

An encoder-decoder model is trained on a denoising task. It receives a corrupted input like The quick [M] fox jumps [M] the lazy dog. and must generate the original, complete sentence The quick brown fox jumps over the lazy dog. The decoder generates the output one word at a time. Why is the training loss typically calculated for each word the decoder generates, rather than just a single loss for the entire completed sentence?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related