Google

The prefilling phase involves a parallel computation where the entire input sequence is processed at once to generate the KV cache. A key outcome of this process is the determination of the probability distribution for the first output token. Furthermore, in certain scenarios, this phase can extend to predict subsequent tokens, such as the second output token.

Token Prediction within the Prefilling Phase

When a large language model is given a long text prompt, the initial processing phase computes representations for all input tokens simultaneously. Considering this highly parallel approach, what is the most direct and immediate outcome related to generating the first piece of the model's response?

Based on this scenario, describe the two primary results that have been computed by the model at this specific point in time, just before any new words are generated.

Post-Prefilling State Analysis

During the initial processing of an input sequence, the parallel computation phase is strictly limited to generating the probability distribution for only the single, first output token.

Learn Before

Related