1Cademy - In a causal self-attention mechanism, a relative position bias is added to the dot product of each query-key pair. The bias is determined by bucketing the relative position offset, which is calculated as (query index - key index). Given the following bucketing rules: - Offset 0 → Bucket 0 - Offset 1 → Bucket 1 - Offsets 2 or 3 → Bucket 2 - Offsets 4 or greater → Bucket 3 Match each query-key pair below to the correct bias bucket that would be applied.

Learn Before

Visual Representation of T5 Bias Application (nb=3, distmax=5)

Matching

In a causal self-attention mechanism, a relative position bias is added to the dot product of each query-key pair. The bias is determined by bucketing the relative position offset, which is calculated as (query index - key index). Given the following bucketing rules: