1Cademy - Token Prediction within the Prefilling Phase

Learn Before

Parallel Self-Attention in the Prefilling Phase

Activity (Process)

Token Prediction within the Prefilling Phase

The prefilling phase involves a parallel computation where the entire input sequence is processed at once to generate the KV cache. A key outcome of this process is the determination of the probability distribution for the first output token. Furthermore, in certain scenarios, this phase can extend to predict subsequent tokens, such as the second output token.

Updated 2025-10-10

Contributors are: