Learn Before
Rationale for Averaging Time-Step Losses
In a model that processes sequential data, why is the overall performance typically evaluated by averaging the error from each individual step in the sequence, rather than by only considering the error at the very last step?
0
1
Tags
Data Science
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Backpropagation Through Time (BPTT)
A model designed to process sequential data is evaluated on a sequence of 4 time steps. The loss (error) is calculated independently at each time step, yielding the following values: [0.2, 0.5, 0.1, 0.4]. Based on the standard method for computing the total loss for the entire sequence, what is the final loss value?
Evaluating Loss Calculation Strategies
Rationale for Averaging Time-Step Losses