Multiple Choice

An attention mechanism needs to process a long sequence of information and considers the 16 most recent key-value pairs to inform its output. One design stores all 16 pairs directly in a cache. An alternative design compresses these same 16 pairs into a single, averaged key-value pair. Assuming each key and each value is a single vector, what is the ratio of memory size required by the first design (storing all pairs) compared to the second design (storing the compressed pair)?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science