Definition

Parameter Matrices for Attention Transformations

The matrices Wjq\mathbf{W}_j^{q}, Wjk\mathbf{W}_j^{k}, and Wjv\mathbf{W}_j^{v} are the parameter matrices that define the transformations used within the self-attention mechanism of a Transformer model. These matrices, which belong to Rd×dτ\mathbb{R}^{d \times \frac{d}{\tau}}, transform the input representation H\mathbf{H} into queries, keys, and values through the equations: Q[j]=HWjq\mathbf{Q}^{[j]} = \mathbf{H} \mathbf{W}_j^{q}, K[j]=HWjk\mathbf{K}^{[j]} = \mathbf{H} \mathbf{W}_j^{k}, and V[j]=HWjv\mathbf{V}^{[j]} = \mathbf{H} \mathbf{W}_j^{v}.

0

1

Updated 2026-04-19

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course