1Cademy - Example of Left-to-Right Token Generation

Learn Before

Left-to-Right Token Generation Process

Example

Example of Left-to-Right Token Generation

To illustrate the left-to-right generation process, consider generating the three tokens $b$ , $c$ , and $d$ given an initial prefix $\langle s \rangle\ a$ . In each step, the language model picks a token $x_i$ from the vocabulary $V$ to maximize the conditional probability $\Pr(x_i|x_0,...,x_{i-1})$ , appending it to the end of the context sequence.

Step 1: Given the context $\langle s \rangle\ a$ , the model predicts $b$ using the decision rule $\arg\max_{x_2 \in V} \Pr(x_2|\langle s \rangle\ a)$ . The overall sequence probability becomes $\Pr(\langle s \rangle) \cdot \Pr(a|\langle s \rangle) \cdot \Pr(b|\langle s \rangle\ a)$ .
Step 2: With the new context $\langle s \rangle\ a\ b$ , the model predicts $c$ using $\arg\max_{x_3 \in V} \Pr(x_3|\langle s \rangle\ a\ b)$ , updating the sequence probability by multiplying it by $\Pr(c|\langle s \rangle\ a\ b)$ .
Step 3: Based on the expanded context $\langle s \rangle\ a\ b\ c$ , the model predicts $d$ using $\arg\max_{x_4 \in V} \Pr(x_4|\langle s \rangle\ a\ b\ c)$ , again updating the total sequence probability by multiplying it by $\Pr(d|\langle s \rangle\ a\ b\ c)$ .

This demonstrates how each predicted token is iteratively added to the context to inform the next prediction.

0

1

Updated 2026-04-18

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related