Causation

Latency from Sequential Dependency in Autoregressive Generation

In autoregressive models, the generation of each token is causally dependent on all previously generated tokens. This sequential dependency means that the computation for a given token cannot begin until the computation for the preceding token is complete. As a result, there is an inherent delay in predicting subsequent tokens in a sequence; for instance, the prediction of the second output token is delayed until the first has been fully generated.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences