Learn Before
Imagine a self-attention mechanism where a modification adds a penalty to the attention score between any two words. This penalty is designed to increase in a straight, consistent line as the distance between the words' positions in the sequence grows. What is the most likely behavioral outcome of this modification?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
ALiBi Bias Term Formula
Imagine a self-attention mechanism where a modification adds a penalty to the attention score between any two words. This penalty is designed to increase in a straight, consistent line as the distance between the words' positions in the sequence grows. What is the most likely behavioral outcome of this modification?
In a self-attention mechanism that incorporates a linear bias based on token distance, the bias term added to the attention score is a positive value that decreases as the distance between the query and key increases.
Effect of Distance-Based Attention Penalty