Formula

Direct Computation of Output Sequence Log-Probability in LLMs

In common implementations of Large Language Models (LLMs), the log-probability of the input sequence does not need to be computed. Instead, the model directly computes the conditional log-probability of the output sequence given the input. This is done by summing the log-probabilities of each individual output token. The formula is:

logPr(yx)=i=1nlogPr(yix,y<i)\log \Pr(\mathbf{y}|\mathbf{x}) = \sum_{i=1}^{n} \log \Pr(y_i|\mathbf{x},\mathbf{y}_{<i})

In this notation, [x,y<i][\mathbf{x},\mathbf{y}_{<i}] represents the context used for predicting the token yiy_i. Furthermore, the expression Pr(yix,y<i)\Pr(y_i|\mathbf{x},\mathbf{y}_{<i}) is a common literature shorthand used to denote Pr(yi[x,y<i])\Pr(y_i|[\mathbf{x},\mathbf{y}_{<i}]).

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related