Learn Before
Kerple Logarithmic Bias Formula
The Kerple method for positional bias can be implemented using a logarithmic function to penalize attention based on token distance. For a query at position and a key at position , the bias is calculated with the formula: . Here, and are hyperparameters controlling the scale and shape of the logarithmic penalty.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Related
Kerple Positional Bias Formula
Kerple Logarithmic Bias Formula
Sandwich Method (Chi et al., 2023)
Formula for Relative Position Scaled by Sinusoidal Wavelength
A transformer model incorporates a positional bias mechanism where a penalty is applied to the attention score between a query and a key. This penalty grows larger as the distance between the query's position and the key's position in the sequence increases. Given the sentence 'The quick brown fox jumps over the lazy dog', which of the following query-key pairs would receive the smallest penalty from this mechanism?
Comparing Positional Bias Functions
A self-attention mechanism is modified to include a bias term that systematically penalizes attention scores between pairs of tokens. The magnitude of this penalty increases as the distance between the tokens' positions in the sequence grows. For which of the following tasks would this modification be most likely to hinder the model's performance?
Learn After
A model designer implements a positional bias using the formula
Bias = -βlog(1 + β), whereβis a positive value that increases with token distance. The goal is to penalize attention to more distant tokens. By mistake, the designer forgets the leading negative sign, implementing the formula asBias = βlog(1 + β). What is the most likely effect of this error on the model's behavior?Analysis of Positional Penalty Growth
Selecting a Positional Bias Strategy