Concept

Fixed-Size KV Cache for Long-Context Inference

One technique for managing long input sequences during inference involves using a Key-Value (KV) cache of a fixed size. This method allows a model to retain a constrained amount of past information at each step, addressing the challenge of long contexts without requiring unbounded memory resources.

0

1

Updated 2026-04-30

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.3 Prompting - Foundations of Large Language Models