Analysis of Language Model Training Objectives
A researcher is training a language model for a summarization task using [article, summary] pairs. They are considering two different methods for calculating the training loss:
- Method 1: The loss is calculated based on the model's predictions for all tokens in the concatenated
[article, summary]sequence. - Method 2: The loss is calculated based only on the model's predictions for the tokens in the
summarypart of the sequence.
For each method, identify whether it corresponds to optimizing a joint probability objective or a conditional probability objective. Then, explain the key difference in what the model is being trained to accomplish with each objective.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.5 Inference - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Relationship Between Joint, Conditional, and Marginal Log-Probabilities of Sequences
A developer is fine-tuning a language model on a dataset of
[instruction, response]pairs. Initially, the training process calculated the prediction loss across all tokens in both theinstructionand theresponse. The developer then modifies the process to calculate loss only on the tokens in theresponse. What is the primary effect of this change on the model's training objective?Analysis of Language Model Training Objectives
Selecting an Appropriate Language Model Training Objective