Learn Before
An engineer is reviewing a diagram that claims to illustrate the initial processing of an input sequence of tokens (x0 to xm-1). The diagram depicts a process where the query, key, and value vectors for the first token (q0, k0, v0) are generated, then used to produce an output, which is then used alongside the second token (x1) to generate the next set of vectors (q1, k1, v1), and so on, iterating through the entire sequence. Why does this diagram incorrectly represent the prefilling phase?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineer is reviewing a diagram that claims to illustrate the initial processing of an input sequence of tokens (
x0toxm-1). The diagram depicts a process where the query, key, and value vectors for the first token (q0,k0,v0) are generated, then used to produce an output, which is then used alongside the second token (x1) to generate the next set of vectors (q1,k1,v1), and so on, iterating through the entire sequence. Why does this diagram incorrectly represent the prefilling phase?A diagram of the prefilling phase shows how an entire input sequence is processed at once. Arrange the following computational events in the correct chronological order as depicted in such a diagram.
Analyzing Prefilling Phase Inefficiency