1Cademy - Designing a Relative Positional Bias Scheme

Learn Before

Learned Parameters for T5 Bias

Case Study

Designing a Relative Positional Bias Scheme

Evaluate the two strategies presented in the case study. Which strategy allows the model to more flexibly adapt to the specific relational patterns in the training data, and why?

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

In a specific attention mechanism, the relative distance between any two positions in a sequence is mapped to one of a fixed number of 'buckets'. Each bucket has a single, corresponding scalar bias value that is added to the attention logits. Considering how such a model adapts to data, which statement best describes how the specific scalar bias value for each bucket is determined?
Designing a Relative Positional Bias Scheme
In a transformer architecture that uses a bucketed approach for relative positional information, the scalar bias associated with each bucket is determined by a predefined, non-trainable mathematical formula.

Learn Before

Related