Multiple Choice

An engineer is designing a language model that must process very long sequences while keeping the computational cost of attention constant at each step. They are considering two approaches for the model's memory component:

  • Approach 1: The memory stores the raw key-value pairs from the 256 most recent positions in the sequence.
  • Approach 2: The memory is a pair of fixed-size 'summary' vectors, which are calculated by mathematically combining all preceding key-value pairs into a single, condensed representation.

Which statement best analyzes the primary trade-off between these two approaches?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science