Learn Before
Formula

Softmax Renormalization in Top-k Sampling

In top-k sampling, after the candidate pool Vi\overline{V}_i is determined, the probability distribution over this restricted set can be calculated using the Softmax function applied to the token logits. If uyiu_{y_i} represents the logit for token yiy_i, the rescaled probability Pr(yix,y<i)\overline{\Pr}(y_i|\mathbf{x},\mathbf{y}_{<i}) is given by: Pr(yix,y<i)=exp(uyi)yjViexp(uyj)\overline{\Pr}(y_i|\mathbf{x},\mathbf{y}_{<i}) = \frac{\exp(u_{y_i})}{\sum_{y_j \in \overline{V}_i} \exp(u_{y_j})}

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related