In a sequence processing model, the unnormalized attention score between a query at position i and a key at position j is calculated using the formula: Score(i, j) = (q_i ⋅ k_j + PE(i, j)) / √d. What is the primary function of the PE(i, j) term in this calculation?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Formula for Causal Attention
In a sequence processing model, the unnormalized attention score between a query at position
iand a key at positionjis calculated using the formula:Score(i, j) = (q_i ⋅ k_j + PE(i, j)) / √d. What is the primary function of thePE(i, j)term in this calculation?Analyzing Components of an Attention Score Formula
Diagnosing a Language Model's Performance Issue
Interpretation of Positional Bias as a Distance Penalty