Formula

Mathematical Justification for Greedy Search

The mathematical basis for greedy search relies on simplifying its core objective. At step ii, the goal is to select the best token yiy_i that maximizes the log-probability of the entire sequence up to that point, logPr(y1...yix)\log \Pr(y_1...y_i|\mathbf{x}). Since this total log-probability decomposes into the accumulated log-probability of the preceding sequence logPr(y<ix)\log \Pr(\mathbf{y}_{<i}|\mathbf{x}) (which is fixed with respect to yiy_i) and the conditional log-probability of the new token logPr(yix,y<i)\log \Pr(y_i|\mathbf{x},\mathbf{y}_{<i}), maximizing the sum simplifies to maximizing only the newly computed token log-probability. The formal derivation is: yitop1=arg maxyiVlogPr(y1...yix)=arg maxyiV[logPr(y<ix)+logPr(yix,y<i)]=arg maxyiVlogPr(yix,y<i)y_i^{\mathrm{top}1} = \argmax_{y_i \in V} \log \Pr(y_1...y_i|\mathbf{x}) = \argmax_{y_i \in V} \big[ \log \Pr(\mathbf{y}_{<i}|\mathbf{x}) + \log \Pr(y_i|\mathbf{x},\mathbf{y}_{<i}) \big] = \argmax_{y_i \in V} \log \Pr(y_i|\mathbf{x},\mathbf{y}_{<i}).

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related