1Cademy - Memory-Based Attention as a Form of Internal Memory

Learn Before

Internal Memory in LLMs
Efficient and Compressed Memory Models

Concept

Memory-Based Attention as a Form of Internal Memory

As an alternative to efficient attention methods like sparse or linear attention, the context from preceding tokens can be explicitly encoded using an additional memory model. In this approach, a memory component, denoted as Mem, is used to represent and retain the contextual information from the keys and values, often in a fixed-size format. This strategy aims to manage the growing Key-Value (KV) cache as inference proceeds.