1Cademy - Formula for Attention Weight with Relative Positional Encoding

Learn Before

Formula

Formula for Attention Weight with Relative Positional Encoding

One of the simplest forms of self-attention incorporating relative positional embedding modifies the attention weight calculation while maintaining the standard weighted sum for the output. The attention output vector is computed as: $\mathrm{Att}_{\mathrm{qkv}}(\mathbf{q}_i, \mathbf{K}_{\le i}, \mathbf{V}_{\le i}) = \sum_{j=0}^{i} \alpha(i,j) \mathbf{v}_j$ The attention weight $\alpha(i, j)$ is calculated by adding a relative positional encoding bias term $\mathrm{PE}(i, j)$ to the query-key product: $\alpha(i, j) = \mathrm{Softmax}\left(\frac{\mathbf{q}_i \mathbf{k}_j^\top + \mathrm{PE}(i, j)}{\sqrt{d}} + \mathrm{Mask}(i, j)\right)$ The only difference between this approach and the original self-attention model is the addition of the $\mathrm{PE}(i,j)$ bias term.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

References

Learn Before

Related

Learn After