1Cademy - In a causal attention mechanism that incorporates relative positional information, consider the calculation of attention for an output at position `i`. If the dot product of the query vector from position `i` with the key vector from position `j` is identical to its dot product with the key vector from position `k` (where `j ≠ k`, and both `j, k < i`), then the final attention weights assigned to positions `j` and `k` will also be identical.

Learn Before

Attention Weight with Relative Positional Encoding

True/False

In a causal attention mechanism that incorporates relative positional information, consider the calculation of attention for an output at position i. If the dot product of the query vector from position i with the key vector from position j is identical to its dot product with the key vector from position k (where j ≠ k, and both j, k < i), then the final attention weights assigned to positions j and k will also be identical.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related