Learn Before
Formula for Logarithmic Bucketing in T5 Bias
In the T5 bias mechanism, the bucket number for a large offset, where , is calculated using a logarithmic scale. The formula is defined as:
This equation maps the offset to a bucket index by normalizing its logarithmic position relative to a maximum distance, , and scaling it to the available number of logarithmic buckets.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Formula for Logarithmic Bucketing in T5 Bias
Final Bucket for Offsets Exceeding dist_max in T5 Bias
Parameter Efficiency for Long-Range Dependencies
A model needs to represent the relative distance between elements in a long sequence using a limited number of shared parameters (buckets). The model's designers have determined that precise distance is important for nearby elements, but for elements that are far apart, a less precise, general sense of distance is sufficient. Which bucketing strategy best balances parameter efficiency with this modeling requirement?
In a model that uses logarithmic bucketing for large relative position offsets, it is plausible that the same learned bias parameter would be applied to an offset of 500 as to an offset of 510, while offsets of 10 and 20 would likely receive distinct bias parameters.
Learn After
dist_max Parameter in T5 Bias
A model uses logarithmic bucketing to handle large relative position offsets. The bucket index
b(d)for a given distancedis calculated using the formula below. Given a system with 32 total buckets (n_b = 32), a maximum distance of 128 (dist_max = 128), and a specific offsetd = 64, what is the resulting bucket index?b(d) = (n_b/2) + floor( (log(d) - log(n_b/2)) / (log(dist_max) - log(n_b/2)) * (n_b/2) )(Note: Use the natural logarithm for all
logoperations.)Analyzing Parameter Impact on Logarithmic Bucketing
A model calculates a bucket index
b(d)for a large relative position offsetdusing the following formula, wheren_bis the total number of buckets anddist_maxis a maximum distance:b(d) = (n_b/2) + floor( (log(d) - log(n_b/2)) / (log(dist_max) - log(n_b/2)) * (n_b/2) )True or False: This formula establishes a linear relationship between the offset
dand the bucket indexb(d), meaning that asdincreases, the bucket indexb(d)increases at a constant rate.