Formula

Temperature-Scaled Softmax for Token Probability

This formula calculates the probability of generating a specific token yiy_i given an input x\mathbf{x} and the preceding tokens y<i\mathbf{y}_{<i}. It applies the softmax function to the model's output scores (logits), denoted by uu, for all possible tokens yjy_j in the vocabulary VV. A temperature parameter, β\beta, scales the scores before the exponential function is applied. This scaling adjusts the shape of the probability distribution: lower temperatures result in a sharper, more deterministic distribution, while higher temperatures create a flatter, more random distribution. The formula is: Pr(yix,y<i)=exp(uyi/β)yjVexp(uyj/β)Pr(y_i|\mathbf{x}, \mathbf{y}_{<i}) = \frac{\exp(u_{y_i} / \beta)}{\sum_{y_j \in V} \exp(u_{y_j} / \beta)}.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences