1Cademy - Temperature-Scaled Softmax for Token Probability

Learn Before

Predictive Inference by Maximizing Conditional Probability

Formula

Temperature-Scaled Softmax for Token Probability

This formula calculates the probability of generating a specific token $y_i$ given an input $\mathbf{x}$ and the preceding tokens $\mathbf{y}_{<i}$ . It applies the softmax function to the model's output scores (logits), denoted by $u$ , for all possible tokens $y_j$ in the vocabulary $V$ . A temperature parameter, $\beta$ , scales the scores before the exponential function is applied. This scaling adjusts the shape of the probability distribution: lower temperatures result in a sharper, more deterministic distribution, while higher temperatures create a flatter, more random distribution. The formula is: $Pr(y_i|\mathbf{x}, \mathbf{y}_{<i}) = \frac{\exp(u_{y_i} / \beta)}{\sum_{y_j \in V} \exp(u_{y_j} / \beta)}$ .

0

1

Updated 2025-10-08

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After