1Cademy - Token Selection from Probability Distribution

Distribution A: {&#x27;meal&#x27;: 0.90, &#x27;dish&#x27;: 0.05, &#x27;surprise&#x27;: 0.03, &#x27;error&#x27;: 0.02}
Distribution B: {&#x27;soup&#x27;: 0.30, &#x27;stew&#x27;: 0.25, &#x27;salad&#x27;: 0.22, &#x27;dessert&#x27;: 0.23}

Learn Before

Auto-Regressive Generation Process

Concept

Token Selection from Probability Distribution

After a language model computes the probability distribution for the next token, Pr(·|x_0, ..., x_{i-1}), a specific token x_i must be chosen from this distribution. This selection process, also known as decoding or sampling, is a fundamental step in text generation.

Updated 2026-04-18

Contributors are: