Learn Before
Concept

Generalization Advantage of T5 Positional Bias

The T5 relative positional bias architecture is designed to generalize effectively to sequences longer than those encountered during training. This is accomplished by sharing a single learnable parameter across multiple similar query-key offsets. Such a parameter-sharing strategy is particularly beneficial because large offsets are infrequent in training data, making it more efficient than learning unique parameters for every possible offset.

0

1

Updated 2026-01-15

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences