Formula

Value Weight Matrix Formula

The formula WjvRd×dτ\mathbf{W}_{j}^{v} \in \mathbb{R}^{d \times \frac{d}{\tau}} defines the value weight matrix for the j-th attention head. This matrix, denoted as Wjv\mathbf{W}_{j}^{v}, is an element of the set of real-numbered matrices (R\mathbb{R}) with dimensions d×dτd \times \frac{d}{\tau}. Here, dd typically represents the model's embedding dimension, and τ\tau is a parameter, often related to the number of attention heads.

Image 0

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related