1Cademy - Analyzing Training Loss in a Sequence Generation Task

Learn Before

Loss Calculation for Encoder-Decoder Denoising Tasks

Short Answer

Analyzing Training Loss in a Sequence Generation Task

An engineer is training a sequence-to-sequence model to correct grammatical errors in sentences. During training, they observe that for one specific long sentence, the model generates the first 90% of the words perfectly, but makes several significant errors in the final 10% of the sequence. Despite the high accuracy on the initial part of the sentence, the total calculated loss for this training example is surprisingly high. Based on the typical method for calculating loss in such tasks, explain this observation.

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related