A researcher is training an encoder-decoder model using a denoising objective, where the model learns to reconstruct an original sentence from a corrupted version. Arrange the following steps of a single training iteration in the correct chronological order.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An encoder-decoder model is trained on a denoising task. It receives a corrupted input like
The quick [M] fox jumps [M] the lazy dog.and must generate the original, complete sentenceThe quick brown fox jumps over the lazy dog.The decoder generates the output one word at a time. Why is the training loss typically calculated for each word the decoder generates, rather than just a single loss for the entire completed sentence?Analyzing a Denoising Training Process
A researcher is training an encoder-decoder model using a denoising objective, where the model learns to reconstruct an original sentence from a corrupted version. Arrange the following steps of a single training iteration in the correct chronological order.