1Cademy - Softmax-based k-NN Probability Distribution

Learn Before

Aggregated Distance Calculation for k-NN Vocabulary Distribution

Formula

Softmax-based k-NN Probability Distribution

In a ${}k$ -nearest neighbors ( ${}k$ -NN) language model, a retrieval-based probability distribution is defined over the vocabulary $V$ . Given a hidden state representation $\mathbf{h}_i$ , this distribution is computed by applying the Softmax function to a vector of negative distances: $\mathrm{Pr}_{k\mathrm{nn}}(\cdot|\mathbf{h}_i) = \mathrm{Softmax}\left(\begin{bmatrix} -d_0 & \cdots & -d_{|V|} \end{bmatrix}\right)$ . Here, $d_v$ represents the distance between $\mathbf{h}_i$ and the retrieved key $\mathbf{z}_j$ if the corresponding reference token $w_j$ matches the $v$ -th entry of the vocabulary $V$ ; otherwise, $d_v$ is ${}0$ .

0

1

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After