True/False

Consider a transformer model's attention mechanism that uses a set of 'buckets' to store shared parameters for relative positions. For small, non-negative distances between a query and a key, a direct one-to-one correspondence is used where the bucket index is identical to the distance. Based on this rule, an interaction between a query at position 5 and a key at position 2 would be assigned to bucket index 3.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science