Learn Before
Calculating a Relative Position Bias Bucket
A model computes a bias for relative positions using a unified bucketing function. This function maps a relative position offset, i-j, to a bucket index, b(i-j), using the piecewise formula below. The formula uses a direct mapping for small offsets and a logarithmic mapping for larger ones.
Formula:
- If
0 <= i-j < (n_b+1)/2, thenb(i-j) = i-j - If
i-j >= (n_b+1)/2, thenb(i-j) = min(n_b, (n_b+1)/2 + floor( (log(i-j) - log((n_b+1)/2)) / (log(dist_max) - log((n_b+1)/2)) * (n_b+1)/2 ))
Given the scenario below, calculate the correct bucket index.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Comparison of Position Offsets in Causal vs. Bidirectional Attention
Calculating a Relative Position Bias Bucket
The T5 relative position bias bucketing formula is a piecewise function, treating small and large relative position offsets differently. For small offsets, it uses a direct one-to-one mapping to a bucket. For larger offsets, it transitions to a logarithmic mapping. What is the primary rationale behind this dual-strategy design?
A key characteristic of the T5 relative position bias bucketing formula is that it maintains a consistent level of positional precision regardless of the distance between tokens. For example, the distinction it makes between relative positions 10 and 20 is just as fine-grained as the distinction it makes between relative positions 500 and 510.
Visualization of T5 Bias Bucketing