1Cademy - Conditional Probability Formula for Autoregressive Models using Softmax

Learn Before

Formula

Conditional Probability Formula for Autoregressive Models using Softmax

In autoregressive models, the conditional probability of the next token $y_i$ , given an input $\mathbf{x}$ and the preceding tokens $\mathbf{y}_{<i}$ , is often calculated using the softmax function. This is expressed as: $\overline{\text{Pr}}(y_i|\mathbf{x}, \mathbf{y}_{<i}) = \frac{\exp(u_{y_i})}{\sum_{y_j \in \overline{V}_i} \exp(u_{y_j})}$ Here, $u_{y_i}$ represents the unnormalized score (or logit) for the token $y_i$ . The probability is obtained by exponentiating this score and normalizing it by the sum of exponentiated scores of all candidate tokens $y_j$ within a specific vocabulary subset $\overline{V}_i$ .

Updated 2026-05-02

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After