Learn Before
Analyzing the Functional Approach to Positional Bias
A method for incorporating positional information into a model's token-to-token scoring system calculates a bias term using a specific mathematical function that depends only on the relative distance between two token positions. What is the key advantage of using such a continuous mathematical function for this purpose, compared to using a discrete lookup table where each possible relative distance has its own unique, learned bias value?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
FIRE Positional Bias Formula
A self-attention mechanism is designed so that the positional influence on the attention score between any two tokens depends only on their relative distance, not their absolute locations. For instance, the positional adjustment between the 3rd and 7th tokens is identical to the adjustment between the 23rd and 27th tokens. Which of the following techniques directly implements this principle?
Analyzing the Functional Approach to Positional Bias
An LLM architect is designing a self-attention mechanism where the positional influence between any two tokens is calculated directly as a bias in the attention score. The core design principle is that this bias must be determined by a specific, continuous mathematical function that takes only the relative distance between the tokens as its input. Which of the following implementation strategies directly realizes this design principle?