Formula

Relationship Between Joint, Conditional, and Marginal Log-Probabilities of Sequences

The relationship between the joint, conditional, and marginal probabilities of token sequences is defined by the chain rule of probability. In logarithmic form, this allows the conditional log-probability of an output sequence y given an input x to be expressed in terms of the joint and marginal log-probabilities. The formula is: logPrθ(yx)=logPrθ([x,y])logPrθ(x)\log \text{Pr}_{\theta}(\mathbf{y}|\mathbf{x}) = \log \text{Pr}_{\theta}([\mathbf{x}, \mathbf{y}]) - \log \text{Pr}_{\theta}(\mathbf{x}) This identity can be rearranged to show how the joint log-probability of the concatenated sequence [x, y] is composed of the marginal and conditional parts: logPrθ([x,y])=logPrθ(x)+logPrθ(yx)\log \text{Pr}_{\theta}([\mathbf{x}, \mathbf{y}]) = \log \text{Pr}_{\theta}(\mathbf{x}) + \log \text{Pr}_{\theta}(\mathbf{y}|\mathbf{x}) This relationship is fundamental for training language models, where the objective is often to maximize the conditional log-probability log Pr_θ(y|x).

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models

Related