1Cademy - Classification of Memory Models in LLMs

Scenario X: Processing a batch of 32 user requests simultaneously, where each request has a context length of 500 tokens.
Scenario Y: Processing a single user request that involves summarizing a very long document with a context length of 16,000 tokens.

Learn Before

Memory Bottleneck from KV Cache in LLMs

Classification

Classification of Memory Models in LLMs

Memory models designed to address context length limitations in Large Language Models (LLMs) are broadly categorized into two types. Internal memories are integrated within the model and operate by updating the Key-Value (KV) cache. In contrast, external memories function as independent modules that access vast amounts of contextual information for the LLM.

Updated 2025-10-10

Contributors are: