Learn Before
An attention mechanism groups relative token distances (offsets) into buckets according to a specific scheme. Match each given offset to its correct bucket number based on the following rules:
- Offsets 0-15 are mapped one-to-one to buckets 0-15.
- For larger distances, buckets cover logarithmically increasing ranges:
- Bucket 16: offsets 16-20
- Bucket 17: offsets 21-26
- Bucket 18: offsets 27-33
- This pattern continues until a final bucket, Bucket 32, which covers all offsets from 802 onwards.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A model's attention mechanism uses a system to group relative distances (offsets) between tokens into buckets. The system follows these rules:
- Offsets from 0 to 15 are each assigned to their own unique bucket (e.g., offset 10 is in bucket 10).
- For larger distances, the buckets cover logarithmically increasing ranges of offsets. Specifically:
- Bucket 16 covers offsets from 16 to 20.
- Bucket 17 covers offsets from 21 to 26.
- Bucket 18 covers offsets from 27 to 33.
Given these rules, which of the following pairs of token offsets would be assigned to the exact same bucket?
An attention mechanism groups relative token distances (offsets) into buckets using the following rules:
- Offsets 0 through 15 are mapped directly to their corresponding buckets (e.g., offset 12 is in bucket 12).
- For larger distances, the buckets cover logarithmically increasing ranges:
- Bucket 16 covers offsets 16-20.
- Bucket 17 covers offsets 21-26.
- Bucket 18 covers offsets 27-33.
Following this pattern, the relative position offset of 30 would be assigned to bucket ____.
An attention mechanism groups relative token distances (offsets) into buckets according to a specific scheme. Match each given offset to its correct bucket number based on the following rules:
- Offsets 0-15 are mapped one-to-one to buckets 0-15.
- For larger distances, buckets cover logarithmically increasing ranges:
- Bucket 16: offsets 16-20
- Bucket 17: offsets 21-26
- Bucket 18: offsets 27-33
- This pattern continues until a final bucket, Bucket 32, which covers all offsets from 802 onwards.