Short Answer

Analyzing Training Loss in a Sequence Generation Task

An engineer is training a sequence-to-sequence model to correct grammatical errors in sentences. During training, they observe that for one specific long sentence, the model generates the first 90% of the words perfectly, but makes several significant errors in the final 10% of the sequence. Despite the high accuracy on the initial part of the sentence, the total calculated loss for this training example is surprisingly high. Based on the typical method for calculating loss in such tasks, explain this observation.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science