Example

Example of Left-to-Right Token Generation

To illustrate the left-to-right generation process, consider generating the three tokens bb, cc, and dd given an initial prefix s a\langle s \rangle\ a. In each step, the language model picks a token xix_i from the vocabulary VV to maximize the conditional probability Pr(xix0,...,xi1)\Pr(x_i|x_0,...,x_{i-1}), appending it to the end of the context sequence.

  • Step 1: Given the context s a\langle s \rangle\ a, the model predicts bb using the decision rule argmaxx2VPr(x2s a)\arg\max_{x_2 \in V} \Pr(x_2|\langle s \rangle\ a). The overall sequence probability becomes Pr(s)Pr(as)Pr(bs a)\Pr(\langle s \rangle) \cdot \Pr(a|\langle s \rangle) \cdot \Pr(b|\langle s \rangle\ a).
  • Step 2: With the new context s a b\langle s \rangle\ a\ b, the model predicts cc using argmaxx3VPr(x3s a b)\arg\max_{x_3 \in V} \Pr(x_3|\langle s \rangle\ a\ b), updating the sequence probability by multiplying it by Pr(cs a b)\Pr(c|\langle s \rangle\ a\ b).
  • Step 3: Based on the expanded context s a b c\langle s \rangle\ a\ b\ c, the model predicts dd using argmaxx4VPr(x4s a b c)\arg\max_{x_4 \in V} \Pr(x_4|\langle s \rangle\ a\ b\ c), again updating the total sequence probability by multiplying it by Pr(ds a b c)\Pr(d|\langle s \rangle\ a\ b\ c).

This demonstrates how each predicted token is iteratively added to the context to inform the next prediction.

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences