Learn Before
Evaluating Training Strategies for a Translation Model
A team is training a model to translate sentences. They are debating two methods for calculating the error and updating the model for each training sentence:
Method 1: For a given sentence, the model generates the full translation. The errors for each individual word in the generated translation are then summed up. This single, total error value is used to adjust the model's parameters.
Method 2: For a given sentence, the model generates the first word of the translation. The error for this single word is calculated, and the model's parameters are immediately adjusted. Then, it generates the second word, calculates the error, and adjusts the parameters again. This process repeats for every word in the translation.
Analyze these two methods. Which method is a more appropriate application of calculating error for a complete data sequence, and why is it generally more effective for training models on sequential tasks like translation?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Loss Function for RNN
Sample-wise Negative Log-Likelihood Loss for a Sub-sequence
Cross-Entropy Loss for Knowledge Distillation
A language model is being trained to generate the four-word sentence 'The quick brown fox'. The model generates one word at a time, and the error (loss) is calculated at each step:
- Loss for 'The' = 0.1
- Loss for 'quick' = 0.3
- Loss for 'brown' = 0.2
- Loss for 'fox' = 0.4
To update the model's parameters, the training process computes a single, overall loss value for the entire sentence. Which statement best analyzes this method of calculating the overall loss?
Total Loss Calculation for a Token Sequence
Calculating Average Sequence-Level Loss
Evaluating Training Strategies for a Translation Model