In a causal self-attention mechanism, a relative position bias is added to the dot product of each query-key pair. The bias is determined by bucketing the relative position offset, which is calculated as (query index - key index). Given the following bucketing rules:
- Offset 0 → Bucket 0
- Offset 1 → Bucket 1
- Offsets 2 or 3 → Bucket 2
- Offsets 4 or greater → Bucket 3
Match each query-key pair below to the correct bias bucket that would be applied.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a causal self-attention mechanism, a relative position bias is added to the dot product of each query-key pair. The bias is determined by bucketing the relative position offset, which is calculated as (query index - key index). Given the following bucketing rules:
- Offset 0 → Bucket 0
- Offset 1 → Bucket 1
- Offsets 2 or 3 → Bucket 2
- Offsets 4 or greater → Bucket 3
Match each query-key pair below to the correct bias bucket that would be applied.
A causal self-attention mechanism uses a relative position bias. The bias is determined by bucketing the relative position offset (query index - key index) according to these rules:
- Offset 0 → Bucket 0
- Offset 1 → Bucket 1
- Offsets 2 or 3 → Bucket 2
- Offsets 4 or greater → Bucket 3
The following grid shows the calculated bias bucket index for each query-key pair in a sequence. One of the bucket indices is incorrect. Identify the query-key pair with the incorrectly calculated bias bucket.
(Key Index) 0 1 2 3 4 5 +------------------ (Q) 0 | 0 X X X X X (u) 1 | 1 0 X X X X (e) 2 | 2 1 0 X X X (r) 3 | 2 2 1 0 X X (y) 4 | 3 1 2 1 0 X 5 | 3 3 2 2 1 0
In a causal self-attention mechanism, a relative position bias is added to the dot product of each query-key pair. The bias is determined by bucketing the relative position offset (query index - key index) according to these rules:
- Offset 0 → Bucket 0
- Offset 1 → Bucket 1
- Offsets 2 or 3 → Bucket 2
- Offsets 4 or greater → Bucket 3
Which of the following statements accurately describes a structural property of the resulting bias matrix?