Formula

Conditional Probability Formula for Autoregressive Models using Softmax

In autoregressive models, the conditional probability of the next token yiy_i, given an input x\mathbf{x} and the preceding tokens y<i\mathbf{y}_{<i}, is often calculated using the softmax function. This is expressed as: Pr(yix,y<i)=exp(uyi)yjViexp(uyj)\overline{\text{Pr}}(y_i|\mathbf{x}, \mathbf{y}_{<i}) = \frac{\exp(u_{y_i})}{\sum_{y_j \in \overline{V}_i} \exp(u_{y_j})} Here, uyiu_{y_i} represents the unnormalized score (or logit) for the token yiy_i. The probability is obtained by exponentiating this score and normalizing it by the sum of exponentiated scores of all candidate tokens yjy_j within a specific vocabulary subset Vi\overline{V}_i.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related