Short Answer

Memory Representation in Attention Mechanisms

A language model is designed to process extremely long documents. To manage computational costs, its attention mechanism uses a fixed-size memory component. One implementation stores the raw key-value pairs from the last 100 tokens. An alternative implementation creates a pair of 'summary vectors' by mathematically combining information from all preceding tokens into a fixed-size representation. Compare these two approaches in terms of the type of historical information each one preserves and the type of information each one might lose.

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science