Multiple Choice

A team is developing a language model designed to process extremely long sequences, but they are constrained by the computational cost of storing and attending to every previous token's key-value pair. They are evaluating two distinct architectural solutions:

  • Solution A: Modify the attention mechanism itself so that each token only attends to a strategically chosen subset of previous tokens, rather than all of them.
  • Solution B: Introduce a separate, fixed-size data structure that periodically summarizes and compresses the key-value pairs from older tokens into a condensed representation.

Which statement best analyzes the fundamental difference in how these two solutions address the long-sequence problem?

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science