Learn Before
Case Study

Evaluating Training Strategies for a Translation Model

A team is training a model to translate sentences. They are debating two methods for calculating the error and updating the model for each training sentence:

Method 1: For a given sentence, the model generates the full translation. The errors for each individual word in the generated translation are then summed up. This single, total error value is used to adjust the model's parameters.

Method 2: For a given sentence, the model generates the first word of the translation. The error for this single word is calculated, and the model's parameters are immediately adjusted. Then, it generates the second word, calculates the error, and adjusts the parameters again. This process repeats for every word in the translation.

Analyze these two methods. Which method is a more appropriate application of calculating error for a complete data sequence, and why is it generally more effective for training models on sequential tasks like translation?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science