Relationship Between Joint, Conditional, and Marginal Log-Probabilities of Sequences
The relationship between the joint, conditional, and marginal probabilities of token sequences is defined by the chain rule of probability. In logarithmic form, this allows the conditional log-probability of an output sequence y given an input x to be expressed in terms of the joint and marginal log-probabilities. The formula is: This identity can be rearranged to show how the joint log-probability of the concatenated sequence [x, y] is composed of the marginal and conditional parts: This relationship is fundamental for training language models, where the objective is often to maximize the conditional log-probability log Pr_θ(y|x).

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.5 Inference - Foundations of Large Language Models
Related
Relationship Between Joint, Conditional, and Marginal Log-Probabilities of Sequences
A developer is fine-tuning a language model on a dataset of
[instruction, response]pairs. Initially, the training process calculated the prediction loss across all tokens in both theinstructionand theresponse. The developer then modifies the process to calculate loss only on the tokens in theresponse. What is the primary effect of this change on the model's training objective?Analysis of Language Model Training Objectives
Selecting an Appropriate Language Model Training Objective
Conditional vs. Joint Probability Objectives in Language Modeling
Relationship Between Joint, Conditional, and Marginal Log-Probabilities of Sequences
General Language Modeling Objective based on Joint Log-Probability
A language model is being used to determine the likelihood of a specific sentence. Let the input sequence
xbe 'The sun is' and the output sequenceybe 'shining brightly'. The notationPr([x, y])represents the probability of the model generating the full, combined sequence. Which statement best analyzes what this probability value signifies?Analysis of Sequence Order on Joint Probability
Conditional Log-Probability via Joint and Marginal Log-Probabilities
Model Comparison Using Joint Sequence Probability
Base Case for Sequence Probability
Joint Probability of a Generated Sequence using the Chain Rule
Relationship Between Joint, Conditional, and Marginal Log-Probabilities of Sequences
Derivation of Sequence Log-Probability via Chain Rule
Logarithmic Form of the Chain Rule for Sequence Probability
Formula for an Impossible Initial Event
A language model is tasked with calculating the total probability of the three-token sequence 'the cat sat'. The model provides the following probability estimates:
- The probability of the first token is
Pr("the") = 0.1 - The probability of the second token, given the first, is
Pr("cat" | "the") = 0.5 - The probability of the third token, given the first two, is
Pr("sat" | "the", "cat") = 0.8
Using the principle that the joint probability of a sequence is the product of the conditional probabilities of its components, what is the joint probability
Pr("the", "cat", "sat")?- The probability of the first token is
Computational Stability of Sequence Probability
Which of the following expressions correctly decomposes the joint probability of a four-token sequence
(x₁, x₂, x₃, x₄)using the chain rule of probability?
Learn After
SFT as Language Model Training on Concatenated Sequences
Calculating Conditional Log-Probability Using an LLM
Selective Loss Computation in Joint Probability Language Modeling
Calculating Conditional Log-Probability
An engineer is evaluating a language model and calculates the following log-probabilities for an input sequence
xand an output sequencey: the joint log-probabilitylog Pr([x, y])and the marginal log-probabilitylog Pr(x). They observe that the value oflog Pr([x, y])is significantly more negative than the value oflog Pr(x). Based on the fundamental relationship between joint, conditional, and marginal probabilities, what is the most accurate conclusion?A language model is being evaluated. For a given input sequence
xand a potential output sequencey, the model calculateslog Pr([x, y]) = -3.5andlog Pr(x) = -5.2. Based on these values, it is reasonable to conclude that the model's probability calculations are functioning correctly.