Conditional vs. Joint Probability Objectives in Language Modeling
A fundamental difference exists between standard language modeling and supervised fine-tuning objectives. Standard language modeling minimizes the loss over all tokens of a concatenated input-output sequence , optimizing the joint log-probability . In contrast, fine-tuning focuses on the conditional log-probability of the output. By applying the chain rule, the joint sequence probability is decomposed into the probability of the input and the conditional probability of the output: . In the fine-tuning context, the loss computation over the input is set to , meaning the loss is computed exclusively for the output tokens .

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.5 Inference - Foundations of Large Language Models
Related
Mathematical Formulation of LLM Inference
Equivalence of Maximizing Auto-regressive Log-Likelihood and Minimizing Cross-Entropy Loss
Conditional vs. Joint Probability Objectives in Language Modeling
Notational Convention for Autoregressive Conditional Probability
Modeling and Efficient Computation of Conditional Token Probabilities
A language model is generating a response sequence 'y' given an input context 'x'. The model generates the two-token sequence y = ('deep', 'learning'). The model's calculated log-probabilities for each step of the generation are as follows:
- Log-probability of the first token:
log Pr(y₁='deep' | x) = -0.7 - Log-probability of the second token, given the first:
log Pr(y₂='learning' | x, y₁='deep') = -0.4
Based on the standard method for calculating the probability of a full sequence, what is the total conditional log-likelihood of the entire sequence 'y', i.e.,
log Pr(y|x)?- Log-probability of the first token:
Comparing Model Confidence via Log-Likelihood
Analyzing a Flawed Log-Likelihood Calculation
Conditional vs. Joint Probability Objectives in Language Modeling
Relationship Between Joint, Conditional, and Marginal Log-Probabilities of Sequences
General Language Modeling Objective based on Joint Log-Probability
A language model is being used to determine the likelihood of a specific sentence. Let the input sequence
xbe 'The sun is' and the output sequenceybe 'shining brightly'. The notationPr([x, y])represents the probability of the model generating the full, combined sequence. Which statement best analyzes what this probability value signifies?Analysis of Sequence Order on Joint Probability
Conditional Log-Probability via Joint and Marginal Log-Probabilities
Model Comparison Using Joint Sequence Probability
Conditional vs. Joint Probability Objectives in Language Modeling
A language model is being trained with the objective of modeling the joint probability of an input sequence
xand an output sequencey, which are treated as a single, concatenated sequence. During a single training step for this combined sequence, how is the model's performance error (loss) calculated?Evaluating a Training Objective for a Base Model
A language model is being trained with the objective of modeling the joint probability of a combined sequence
[x, y]. For this objective, the model's parameters are updated based only on its ability to correctly predict the tokens in the output sequencey.Maximum Likelihood Estimation (MLE) as the Objective for Supervised Fine-Tuning
A development team is fine-tuning a pre-trained language model using a curated dataset of customer support inquiries (inputs) and their corresponding ideal, human-written responses (outputs). The aim is to create a specialized chatbot that reliably provides answers in the same helpful and accurate style as the examples. From a probabilistic perspective, which statement best describes the fundamental objective of this training process?
Correcting a Flawed Fine-Tuning Objective
Objective for a Specialized Math Tutor
Mathematical Formulation of the Supervised Fine-Tuning Objective
Conditional vs. Joint Probability Objectives in Language Modeling
Learn After
Relationship Between Joint, Conditional, and Marginal Log-Probabilities of Sequences
A developer is fine-tuning a language model on a dataset of
[instruction, response]pairs. Initially, the training process calculated the prediction loss across all tokens in both theinstructionand theresponse. The developer then modifies the process to calculate loss only on the tokens in theresponse. What is the primary effect of this change on the model's training objective?Analysis of Language Model Training Objectives
Selecting an Appropriate Language Model Training Objective