1Cademy - Value Weight Matrix Formula

Learn Before

Query, Key, and Value in Attention Mechanisms

Formula

Value Weight Matrix Formula

The value weight matrix for the $j$ -th attention head in a multi-head attention mechanism is defined by the formula $\mathbf{W}_{j}^{v} \in \mathbb{R}^{d \times \frac{d}{\tau}}$ . This specifies that $\mathbf{W}_{j}^{v}$ is a real-valued matrix with $d$ rows and $\frac{d}{\tau}$ columns, where $d$ represents the model's embedding dimension and $\tau$ represents the number of attention heads.