Case Study

Evaluating Memory Models in Attention Mechanisms

An engineering team is designing a language model and is considering two approaches for the memory component (Mem) in the attention operation Att(q_i, Mem).

  • Approach 1: The memory component Mem consists of the complete, unaltered set of all key and value vectors generated up to the current position i.
  • Approach 2: The memory component Mem is a compressed, fixed-size summary of all key and value vectors generated up to the current position i.

Evaluate the primary trade-off between these two approaches, considering both computational resource usage during text generation and the potential impact on the model's ability to handle long-range dependencies in the text. Justify your evaluation.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science