Multiple Choice

A model's attention mechanism uses a system to group relative distances (offsets) between tokens into buckets. The system follows these rules:

  1. Offsets from 0 to 15 are each assigned to their own unique bucket (e.g., offset 10 is in bucket 10).
  2. For larger distances, the buckets cover logarithmically increasing ranges of offsets. Specifically:
    • Bucket 16 covers offsets from 16 to 20.
    • Bucket 17 covers offsets from 21 to 26.
    • Bucket 18 covers offsets from 27 to 33.

Given these rules, which of the following pairs of token offsets would be assigned to the exact same bucket?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related