Learn Before
Diagram of the Prefilling Phase
This diagram illustrates the data flow during the prefilling stage of a Transformer. The entire input sequence, represented as tokens x0 through xm-1, is initially converted into vectors by an Embedding Layer. Following this, a self-attention layer processes all these vectors simultaneously. In this parallel operation, the layer generates a complete set of query vectors (q0 to qm-1), key vectors (k0 to km-1), and value vectors (v0 to vm-1) for the entire input sequence in a single step. This 'processed all at once' approach is the defining characteristic of the prefilling phase.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Self-Attention Formula for the Prefilling Phase
Prefilling as a Compute-Bound Process
Token Prediction within the Prefilling Phase
When a large language model first processes a user's prompt, it can perform calculations for all words in the prompt simultaneously rather than one by one. What is the fundamental condition that makes this highly parallel approach possible during this initial stage?
LLM Inference Performance Analysis
Rationale for Parallelism in Initial Prompt Processing
Diagram of the Prefilling Phase
Learn After
An engineer is reviewing a diagram that claims to illustrate the initial processing of an input sequence of tokens (
x0toxm-1). The diagram depicts a process where the query, key, and value vectors for the first token (q0,k0,v0) are generated, then used to produce an output, which is then used alongside the second token (x1) to generate the next set of vectors (q1,k1,v1), and so on, iterating through the entire sequence. Why does this diagram incorrectly represent the prefilling phase?A diagram of the prefilling phase shows how an entire input sequence is processed at once. Arrange the following computational events in the correct chronological order as depicted in such a diagram.
Analyzing Prefilling Phase Inefficiency