Learn Before
Dimensional Analysis of the Attention Formula
An attention mechanism operates on a Query matrix with dimensions $10 \times 64\textbf{K} with dimensions $20 \times 64, and a Value matrix with dimensions $20 \times 128Att(\textbf{Q}, \textbf{K}, \textbf{V}) = \alpha(\textbf{Q}, \textbf{K})\textbf{V}$, what will be the dimensions of the final output matrix? Explain the steps to arrive at your answer.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.5 Inference - Foundations of Large Language Models
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Attention Weight Matrix (α)
Sparse Attention
Self-attention layers' first approach
In a general attention mechanism, the output is calculated as a weighted sum of the Value vectors, where the weights are determined by the interaction between Query and Key vectors. The standard formula is: . Consider a scenario where this formula is mistakenly altered to be: . What is the most significant consequence of this modification?
Dimensional Analysis of the Attention Formula
Applying the Attention Mechanism Roles
Self-Attention Output Formula for a Single Query