Formula for Attention Weight with Relative Positional Encoding
One of the simplest forms of self-attention incorporating relative positional embedding modifies the attention weight calculation while maintaining the standard weighted sum for the output. The attention output vector is computed as: The attention weight is calculated by adding a relative positional encoding bias term to the query-key product: The only difference between this approach and the original self-attention model is the addition of the bias term.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Attention Weight with Relative Positional Encoding
A language model is designed to generate a sentence one word at a time, from beginning to end. To generate the word at a specific position
i, it uses an attention mechanism to weigh the importance of the words that came before it. Which of the following statements correctly analyzes the structural constraint required for this mechanism to function properly for this specific task?Formula for Attention Weight with Relative Positional Encoding
Analyzing Attention Mechanism Constraints
An autoregressive model is processing the input sequence 'The quick brown fox'. When calculating the output representation for the token 'brown' (the third token), which set of tokens can it attend to if a causal attention mechanism is being used?
Formula for Attention Weight with Relative Positional Encoding
Learn After
Formula for Causal Attention
In a sequence processing model, the unnormalized attention score between a query at position
iand a key at positionjis calculated using the formula:Score(i, j) = (q_i ⋅ k_j + PE(i, j)) / √d. What is the primary function of thePE(i, j)term in this calculation?Analyzing Components of an Attention Score Formula
Diagnosing a Language Model's Performance Issue
Interpretation of Positional Bias as a Distance Penalty