1Cademy - Memory Representation in Attention Mechanisms

Learn Before

Summary Vectors for Memory Compression in Attention

Short Answer

Memory Representation in Attention Mechanisms

A language model is designed to process extremely long documents. To manage computational costs, its attention mechanism uses a fixed-size memory component. One implementation stores the raw key-value pairs from the last 100 tokens. An alternative implementation creates a pair of 'summary vectors' by mathematically combining information from all preceding tokens into a fixed-size representation. Compare these two approaches in terms of the type of historical information each one preserves and the type of information each one might lose.

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related