Learn Before
Multiple Choice

A model designer is implementing a mechanism to account for the relative distance between tokens in a sequence. The proposed strategy uses a unique, learnable value for each of the first few relative distances (e.g., 1, 2, 3...), but then groups larger distances into a smaller set of shared values, with the size of these groups increasing as the distance grows. What is the primary trade-off this combined approach is designed to optimize?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science