Multiple Choice

An engineer is designing a text-generation model and is considering two different configurations for how each new token attends to previous tokens in the sequence.

  • Configuration A: Each new token computes attention scores with only the 16 most recent tokens in the sequence.
  • Configuration B: Each new token computes attention scores with all preceding tokens up to a maximum of 512.

Which statement best analyzes the primary trade-off between these two configurations?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science