Formula

Logarithmic Form of the Chain Rule for Sequence Probability

The chain rule for calculating the joint probability of a sequence can be expressed in an alternative logarithmic form. This is achieved by taking the logarithm of the entire probability expression, which transforms the product of conditional probabilities into a sum. This summation form is computationally more stable, especially for long sequences, as it mitigates the risk of numerical underflow from multiplying many small fractions. The formula is: logPr(x0,...,xm)=i=0mlogPr(xix0,...,xi1)\log \Pr(x_0,...,x_m) = \sum_{i=0}^{m} \log \Pr(x_{i}|x_0,...,x_{i-1}). In this formulation, it is assumed that for the initial token where i=0i=0, the probability is Pr(xix0,...,xi1)=Pr(x0)=1\Pr(x_{i}|x_0,...,x_{i-1}) = \Pr(x_0) = 1. As a consequence of this assumption, the overall probability of the sequence simplifies to Pr(x0,...,xm)=Pr(x0)Pr(x1,...,xmx0)=Pr(x1,...,xmx0)\Pr(x_0,...,x_m) = \Pr(x_0)\Pr(x_1,...,x_m|x_0) = \Pr(x_1,...,x_m|x_0).

Image 0

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences