Learn Before
Calculating T5 Attention Score with Relative Position Bias
In a T5 model's attention head with a dimension d of 64, the dot product between a query q_i and a key k_j is 12. The specific relative position between them corresponds to a learned bias value u of -4. Calculate the final attention score that is fed into the Softmax function. Provide the final numerical answer.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a standard attention mechanism, an attention score is computed from a query vector (q) and a key vector (k). Consider a modification where a learnable scalar bias is added directly to the query-key dot product before the result is scaled and passed through a Softmax function. The value of this bias is determined solely by the relative distance between the query and key. How does this specific modification influence the attention mechanism's behavior?
Calculating T5 Attention Score with Relative Position Bias
A researcher implements a modified attention mechanism where the learnable scalar bias, based on relative position, is applied after the query-key dot product is scaled. The formula used is: What is the most significant consequence of this specific modification compared to the standard approach of adding the bias before scaling?