Concept
QKV Attention Sharing Mechanisms
Query-Key-Value (QKV) attention models can be designed using single-head attention or multiple attention heads paired with various sharing mechanisms. While a multi-head model performs attention on a group of feature sub-spaces in parallel, its Key-Value (KV) cache must retain the key and value representations for all of these parallel heads, denoted mathematically as . To manage these representations efficiently, different sharing mechanisms dictate how keys and values are organized and shared across the multiple attention heads.
0
1
Updated 2026-04-23
Tags
Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences