Formula

Probability Renormalization Formula for Restricted Vocabulary Sampling

In sampling-based decoding methods like top-k or top-p, after a restricted vocabulary Vi\overline{V}_i is selected, the probabilities of the tokens within this set are rescaled to form a new, valid probability distribution. The renormalized probability, Pr\overline{\text{Pr}}, of a token yiy_i is calculated by dividing its original conditional probability by the sum of the original probabilities of all tokens yjy_j in the restricted set Vi\overline{V}_i. This is expressed as: Pr(yix,y<i)=Pr(yix,y<i)yjViPr(yjx,y<i)\overline{\text{Pr}}(y_i|\mathbf{x}, \mathbf{y}_{<i}) = \frac{\text{Pr}(y_i|\mathbf{x}, \mathbf{y}_{<i})}{\sum_{y_j \in \overline{V}_i} \text{Pr}(y_j|\mathbf{x}, \mathbf{y}_{<i})}

Image 0

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences