Short Answer

Analysis of Language Model Training Objectives

A researcher is training a language model for a summarization task using [article, summary] pairs. They are considering two different methods for calculating the training loss:

  • Method 1: The loss is calculated based on the model's predictions for all tokens in the concatenated [article, summary] sequence.
  • Method 2: The loss is calculated based only on the model's predictions for the tokens in the summary part of the sequence.

For each method, identify whether it corresponds to optimizing a joint probability objective or a conditional probability objective. Then, explain the key difference in what the model is being trained to accomplish with each objective.

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science