1Cademy - Parameter Matrices for Attention Transformations

Learn Before

Self-attention layers' first approach

Definition

Parameter Matrices for Attention Transformations

The matrices $\mathbf{W}_j^{q}$ , $\mathbf{W}_j^{k}$ , and $\mathbf{W}_j^{v}$ are the parameter matrices that define the transformations used within the self-attention mechanism of a Transformer model. These matrices, which belong to $\mathbb{R}^{d \times \frac{d}{\tau}}$ , transform the input representation $\mathbf{H}$ into queries, keys, and values through the equations: $\mathbf{Q}^{[j]} = \mathbf{H} \mathbf{W}_j^{q}$ , $\mathbf{K}^{[j]} = \mathbf{H} \mathbf{W}_j^{k}$ , and $\mathbf{V}^{[j]} = \mathbf{H} \mathbf{W}_j^{v}$ .