1Cademy - Value Weight Matrix Definition ($$\mathbf{W}_j^v \in \mathbb{R}^{d \times \frac{d}{\tau}}$$)

Learn Before

Parameter Matrices for Attention Transformations

Formula

Value Weight Matrix Definition ( $\mathbf{W}_j^v \in \mathbb{R}^{d \times \frac{d}{\tau}}$ )

This formula defines the value weight matrix $\mathbf{W}_j^v \in \mathbb{R}^{d \times \frac{d}{\tau}}$ . In multi-head attention mechanisms, the superscript $v$ indicates a value projection matrix, and the subscript $j$ denotes the $j$ -th attention head. The matrix transforms an input representation of dimension $d$ (the model's embedding dimension) into a value representation of reduced dimension $\frac{d}{\tau}$ , where $\tau$ represents the number of attention heads (or a scaling factor).