Multiple Choice

A team is optimizing a text-generation model where the computational cost is dominated by the self-attention mechanism during autoregressive decoding. They need to decide between two potential upgrades:

  1. Upgrade A: Doubling the number of layers in the model while keeping the maximum sequence length the same.
  2. Upgrade B: Doubling the maximum sequence length the model can handle while keeping the number of layers the same.

Assuming the model generates a sequence that fills its maximum length capacity in both scenarios, which upgrade would lead to a greater increase in the total computation time, and what is the nature of that increase?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science