1Cademy - Analysis of T5 Attention Formula Modifications

Learn Before

Formula for Attention with T5 Bias (Unscaled)

Short Answer

Analysis of T5 Attention Formula Modifications

A standard attention mechanism calculates the unnormalized score between a query q_i and a key k_j using the expression (q_i ⋅ k_j) / sqrt(d_k), where d_k is the dimension of the key vectors. In contrast, the T5 model's approach uses the expression q_i ⋅ k_j + u_{b(i-j)}, where u is a learnable scalar bias dependent on the relative positions of i and j. Identify the two primary modifications in the T5 expression compared to the standard one, and explain the functional role of each change within the attention mechanism.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related