Concept

QKV Attention Sharing Mechanisms

Query-Key-Value (QKV) attention models can be designed using single-head attention or multiple attention heads paired with various sharing mechanisms. While a multi-head model performs attention on a group of feature sub-spaces in parallel, its Key-Value (KV) cache must retain the key and value representations for all of these parallel heads, denoted mathematically as {(Ki[1],Vi[1]),,(Ki[τ],Vi[τ])}\left\{(\mathbf{K}_{\le i}^{[1]},\mathbf{V}_{\le i}^{[1]}),\dots,(\mathbf{K}_{\le i}^{[\tau]},\mathbf{V}_{\le i}^{[\tau]})\right\}. To manage these representations efficiently, different sharing mechanisms dictate how keys and values are organized and shared across the multiple attention heads.

Image 0

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Learn After