Verifying Log-Probability Equivalence
A language model is trained to predict sequences of words. Consider the three-word sequence (x₀, x₁, x₂) = ('the', 'cat', 'sat'). The model assigns the following probabilities:
-
The joint probability of the entire sequence:
Pr('the', 'cat', 'sat') = 0.01 -
The individual conditional probabilities:
- Pr('the') = 0.1
- Pr('cat' | 'the') = 0.5
- Pr('sat' | 'the', 'cat') = 0.2
Your task is to demonstrate the mathematical equivalence between the log-probability of the entire sequence and the sum of the conditional log-probabilities. Calculate both values using the natural logarithm (ln) and show that they are equal.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Maximum Likelihood Estimation for Sequential Data
In training a model on a dataset (D) of sequences (\mathbf{x}), a primary goal is to find parameters that maximize the total log-probability of the observed sequences. This objective can be expressed in two equivalent ways:
Form 1:
Form 2:
What fundamental principle of probability justifies the mathematical equivalence between Form 1 and Form 2?
Verifying Log-Probability Equivalence
Analysis of a Language Model Training Objective
The mathematical equivalence between maximizing the log-probability of an entire sequence,
log Pr(x), and maximizing the sum of its conditional log-probabilities,Σ log Pr(x_i | x_<i), is established because the logarithm function transforms a sum of probabilities into a product of log-probabilities.