Short Answer

Analysis of T5 Attention Formula Modifications

A standard attention mechanism calculates the unnormalized score between a query q_i and a key k_j using the expression (q_i ⋅ k_j) / sqrt(d_k), where d_k is the dimension of the key vectors. In contrast, the T5 model's approach uses the expression q_i ⋅ k_j + u_{b(i-j)}, where u is a learnable scalar bias dependent on the relative positions of i and j. Identify the two primary modifications in the T5 expression compared to the standard one, and explain the functional role of each change within the attention mechanism.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science