Concept

Reducing KV Cache Complexity via Head Sharing

The memory footprint of the Key-Value (KV) cache can be decreased not only by reducing the number of tokens cached (represented by the sequence length, mm) but also along other architectural dimensions. A widely adopted approach to achieve this is by enabling the sharing of keys and values across the various attention heads within a multi-head self-attention mechanism.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related