Google

In the initial stage of text generation, all tokens that constitute the input context, denoted as 'x', are provided to the model simultaneously for parallel processing.

Simultaneous Processing of Input Context Tokens

When a language model generates a response, it first processes the user's entire input prompt and then generates the output one token at a time. How does the computational approach for these two phases typically differ in terms of how tokens are handled?

When a language model is given an initial text prompt, it processes the tokens of that prompt one by one, in the order they appear, before it starts generating a response.

When a language model is given an input prompt, all the words in that prompt can be processed at the same time. However, when the model generates its response, it produces the words one after another. Explain the underlying reason for this difference in how the input prompt and the generated response are handled.

Learn Before

Related