Learn Before
Effect of Distance-Based Attention Penalty
A self-attention mechanism is modified to include a bias term. This term is calculated by taking the distance between a query token and a key token and multiplying it by a fixed negative number. Explain how this modification influences the attention scores for tokens that are close together compared to tokens that are far apart.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
ALiBi Bias Term Formula
Imagine a self-attention mechanism where a modification adds a penalty to the attention score between any two words. This penalty is designed to increase in a straight, consistent line as the distance between the words' positions in the sequence grows. What is the most likely behavioral outcome of this modification?
In a self-attention mechanism that incorporates a linear bias based on token distance, the bias term added to the attention score is a positive value that decreases as the distance between the query and key increases.
Effect of Distance-Based Attention Penalty