Short Answer

Optimizing Memory for Long-Sequence Processing

A language model is built with a memory mechanism that, to ensure constant computational cost, only stores the raw key-value pairs from the most recent 512 positions in a sequence. While processing a 10,000-word document, this model fails to recall specific details mentioned in the first few paragraphs. Based on how the memory is represented, critique the current approach and propose an alternative memory representation strategy that could mitigate this issue of losing long-range information.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science