Short Answer

Analysis of Memory Scaling in Sequence Processing

Consider two different methods for processing a long sequence of data items one by one.

  • Method A: At each step i, it calculates an output by attending to all previous data items from 1 to i. To do this, it must keep every single past data item in memory.
  • Method B: At each step i, it updates a fixed-size summary of the past. This summary from step i-1 is combined with the current data item at step i to create the new summary for step i. Only this summary is kept in memory.

Analyze how the memory requirement for each method changes as the number of processed data items (i) grows very large. Which method is more suitable for processing extremely long or continuous data streams, and why?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science