Concept

Generalization Advantage of T5 Bias through Parameter Sharing

The T5 relative positional bias model is capable of generalizing to sequences longer than those encountered during training. This ability stems from its strategy of sharing the same learnable parameter across similar query-key offsets. Such parameter sharing is particularly effective because large offsets are rare in training data, allowing the model to apply learned biases to novel distances by grouping them with familiar ones.

0

1

Updated 2026-04-24

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related