Short Answer

Analyzing Prefilling Phase Inefficiency

An engineer observes that during the initial processing of an input sequence, the time taken to generate all the necessary key and value vectors increases linearly with the number of tokens in the sequence. Based on the typical data flow for this phase, identify the core inefficiency in this observation and describe the correct, more efficient method for generating these vectors.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science