1Cademy - Consider the calculation of an attention weight, which determines the influence of an input at position `j` on the output at a later position `i`. The calculation is based on a formula that includes: 1) a similarity score between vectors from positions `i` and `j`, 2) a term that depends on the relative distance between `i` and `j`, and 3) a masking component that prevents attending to positions `k` where `k > i`. If the term that depends on the relative distance were removed from this calculation, what would be the primary consequence?

Learn Before

Attention Weight with Relative Positional Encoding

Multiple Choice

Consider the calculation of an attention weight, which determines the influence of an input at position j on the output at a later position i. The calculation is based on a formula that includes: 1) a similarity score between vectors from positions i and j, 2) a term that depends on the relative distance between i and j, and 3) a masking component that prevents attending to positions k where k > i. If the term that depends on the relative distance were removed from this calculation, what would be the primary consequence?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related