Activity (Process)

Prefilling Phase in Transformer Inference

The prefilling phase is the initial stage of Transformer inference where the model processes the input sequence, denoted as x, to compute and populate the Key-Value (KV) cache. This stage is named 'prefilling' because its primary function is to prepare and store the key-value vector pairs for every token in the input prompt before the generative decoding process begins.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Learn After