Formula

Softmax-based k-NN Probability Distribution

In a k{}k-nearest neighbors (k{}k-NN) language model, a retrieval-based probability distribution is defined over the vocabulary VV. Given a hidden state representation hi\mathbf{h}_i, this distribution is computed by applying the Softmax function to a vector of negative distances: Prknn(hi)=Softmax([d0dV])\mathrm{Pr}_{k\mathrm{nn}}(\cdot|\mathbf{h}_i) = \mathrm{Softmax}\left(\begin{bmatrix} -d_0 & \cdots & -d_{|V|} \end{bmatrix}\right). Here, dvd_v represents the distance between hi\mathbf{h}_i and the retrieved key zj\mathbf{z}_j if the corresponding reference token wjw_j matches the vv-th entry of the vocabulary VV; otherwise, dvd_v is 0{}0.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences