Short Answer

Critique of a Relative Positional Bias Method

A language model's architecture incorporates a mechanism where a unique, learnable value is assigned to every possible relative distance between any two tokens. For example, a specific value is learned for a distance of 5, another for a distance of 6, and so on, up to the maximum distance encountered in the training data. Evaluate the suitability of this design for a model intended to process documents of highly variable and potentially unbounded lengths. Justify your reasoning.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science