Formula

Formula for Optimal Output Sequence in LLMs

In language model inference, the optimal output sequence, denoted as y^\hat{\mathbf{y}}, is found by maximizing the conditional log probability of the output sequence given the input sequence x\mathbf{x}. This objective is formally expressed by decomposing the joint probability over the nn output tokens: y^=arg maxylogPr(yx)=arg maxyi=1nlogPr(yix0,...,xm,y1,...,yi1)\hat{\mathbf{y}} = \argmax_{\mathbf{y}} \log \Pr(\mathbf{y} | \mathbf{x}) = \argmax_{\mathbf{y}} \sum_{i=1}^{n} \log \Pr(y_i|x_0,...,x_m,y_1,...,y_{i-1}) In this formulation, the input sequence is represented as x0,...,xmx_0,...,x_m, and the equation models the log probability of predicting subsequent tokens starting from position m+1m+1, rather than position 0{}0.

0

1

Updated 2026-04-19

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences