The mathematical equivalence between maximizing the log-probability of an entire sequence, log Pr(x), and maximizing the sum of its conditional log-probabilities, Σ log Pr(x_i | x_<i), is established because the logarithm function transforms a sum of probabilities into a product of log-probabilities.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Maximum Likelihood Estimation for Sequential Data
In training a model on a dataset (D) of sequences (\mathbf{x}), a primary goal is to find parameters that maximize the total log-probability of the observed sequences. This objective can be expressed in two equivalent ways:
Form 1:
Form 2:
What fundamental principle of probability justifies the mathematical equivalence between Form 1 and Form 2?
Verifying Log-Probability Equivalence
Analysis of a Language Model Training Objective
The mathematical equivalence between maximizing the log-probability of an entire sequence,
log Pr(x), and maximizing the sum of its conditional log-probabilities,Σ log Pr(x_i | x_<i), is established because the logarithm function transforms a sum of probabilities into a product of log-probabilities.