Activity (Process)

Token Prediction within the Prefilling Phase

The prefilling phase involves a parallel computation where the entire input sequence is processed at once to generate the KV cache. A key outcome of this process is the determination of the probability distribution for the first output token. Furthermore, in certain scenarios, this phase can extend to predict subsequent tokens, such as the second output token.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course