Learn Before
Analyzing Prefilling Phase Inefficiency
An engineer observes that during the initial processing of an input sequence, the time taken to generate all the necessary key and value vectors increases linearly with the number of tokens in the sequence. Based on the typical data flow for this phase, identify the core inefficiency in this observation and describe the correct, more efficient method for generating these vectors.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineer is reviewing a diagram that claims to illustrate the initial processing of an input sequence of tokens (
x0toxm-1). The diagram depicts a process where the query, key, and value vectors for the first token (q0,k0,v0) are generated, then used to produce an output, which is then used alongside the second token (x1) to generate the next set of vectors (q1,k1,v1), and so on, iterating through the entire sequence. Why does this diagram incorrectly represent the prefilling phase?A diagram of the prefilling phase shows how an entire input sequence is processed at once. Arrange the following computational events in the correct chronological order as depicted in such a diagram.
Analyzing Prefilling Phase Inefficiency