Concept

Components of Fixed-Size KV Caches

In Large Language Models (LLMs), fixed-size KV caches optimize memory by managing different sets of keys and values. This includes the keys and values dynamically generated during active inference, those preserved in the model's primary memory, and those stored or encoded in a compressed memory to retain older contextual information without exceeding the fixed memory capacity.

Image 0

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences