Learn Before
Parameter Implications of Offset-Based Positional Bias
Consider a self-attention model that implements relative positional encoding by assigning a unique, learnable bias parameter to each possible offset between a query and a key. If this model is trained exclusively on sequences with a maximum length of 128 tokens, analyze how many distinct learnable bias parameters are required for this positional encoding scheme. Explain your reasoning.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Generalization Limit of Offset-Specific Biases
Calculating Positional Bias from Offset
In a self-attention mechanism that uses a shared, learnable parameter for each unique relative position offset, which of the following query-key pairs will share the exact same positional bias parameter as the pair with a query at position 8 and a key at position 3?
T5 Bias for Relative Positional Embedding
Parameter Implications of Offset-Based Positional Bias