Activity (Process)

Diagram of the Prefilling Phase

This diagram illustrates the data flow during the prefilling stage of a Transformer. The entire input sequence, represented as tokens x0 through xm-1, is initially converted into vectors by an Embedding Layer. Following this, a self-attention layer processes all these vectors simultaneously. In this parallel operation, the layer generates a complete set of query vectors (q0 to qm-1), key vectors (k0 to km-1), and value vectors (v0 to vm-1) for the entire input sequence in a single step. This 'processed all at once' approach is the defining characteristic of the prefilling phase.

Image 0

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences