Formula

Formula for Attention Weight with Relative Positional Encoding

One of the simplest forms of self-attention incorporating relative positional embedding modifies the attention weight calculation while maintaining the standard weighted sum for the output. The attention output vector is computed as: Attqkv(qi,Ki,Vi)=j=0iα(i,j)vj\mathrm{Att}_{\mathrm{qkv}}(\mathbf{q}_i, \mathbf{K}_{\le i}, \mathbf{V}_{\le i}) = \sum_{j=0}^{i} \alpha(i,j) \mathbf{v}_j The attention weight α(i,j)\alpha(i, j) is calculated by adding a relative positional encoding bias term PE(i,j)\mathrm{PE}(i, j) to the query-key product: α(i,j)=Softmax(qikj+PE(i,j)d+Mask(i,j))\alpha(i, j) = \mathrm{Softmax}\left(\frac{\mathbf{q}_i \mathbf{k}_j^\top + \mathrm{PE}(i, j)}{\sqrt{d}} + \mathrm{Mask}(i, j)\right) The only difference between this approach and the original self-attention model is the addition of the PE(i,j)\mathrm{PE}(i,j) bias term.

Image 0

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences