Analyzing Training Loss in a Sequence Generation Task
An engineer is training a sequence-to-sequence model to correct grammatical errors in sentences. During training, they observe that for one specific long sentence, the model generates the first 90% of the words perfectly, but makes several significant errors in the final 10% of the sequence. Despite the high accuracy on the initial part of the sentence, the total calculated loss for this training example is surprisingly high. Based on the typical method for calculating loss in such tasks, explain this observation.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An encoder-decoder model is being trained on a denoising task. Its goal is to reconstruct an original sentence from a corrupted version. During one training step, the model must generate the target sentence: 'The quick brown fox jumps.' The model generates the following output, one word at a time: 'The quick brown foxx jumps.' Based on how the training loss is typically computed for this type of task, which statement best describes how the error signal is calculated?
You are training an encoder-decoder model on a denoising task. For a single training example, arrange the following steps in the correct order to describe how the total loss is calculated for the target output sequence.
Analyzing Training Loss in a Sequence Generation Task