1Cademy - Analyzing Parameter Impact on Logarithmic Bucketing

Learn Before

Formula for Logarithmic Bucketing in T5 Bias

Case Study

Analyzing Parameter Impact on Logarithmic Bucketing

Two language models are configured with a mechanism that groups large relative position offsets into a limited number of 'buckets'. Both models use a total of 32 buckets (n_b = 32). For any offset d greater than 16, the bucket index is calculated using the following formula:

b(d) = 16 + floor( (log(d) - log(16)) / (log(dist_max) - log(16)) * 16 )

Model A sets its maximum expected offset (dist_max) to 128.
Model B sets its maximum expected offset (dist_max) to 512.

Which model provides finer-grained distinctions for offsets between 60 and 120? Explain your reasoning by referencing how the dist_max parameter influences the formula's output.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related