1Cademy - Prefilling as an Encoding Process

Learn Before

Prefilling Phase in Transformer Inference

Concept

Prefilling as an Encoding Process

The prefilling phase can be conceptualized as an encoding process, even though its underlying mechanism is based on token prediction. The primary objective during this phase is not to generate output tokens, but rather to construct a contextual representation of the input sequence in the form of the Key-Value (KV) cache. This cache is then used to condition the subsequent token generation in the decoding phase.

Updated 2026-05-03

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Comparison of Prefilling and BERT Encoding
A machine learning engineer observes that the initial processing of a user's prompt in a large language model takes a significant amount of time, but subsequent token generation is much faster per token. Based on this observation, which statement best analyzes the primary function of this initial processing phase (prefilling)?
Objectives of Inference Phases
The main goal of the prefilling phase in a generative language model is to generate the first token of the model's response, while the computation of the input sequence's contextual representation is a secondary effect of this process.

Learn Before

Related

Learn After