1Cademy - Latency from Sequential Dependency in Autoregressive Generation

Learn Before

Sequential Generation of Output Tokens

Causation

Latency from Sequential Dependency in Autoregressive Generation

In autoregressive models, the generation of each token is causally dependent on all previously generated tokens. This sequential dependency means that the computation for a given token cannot begin until the computation for the preceding token is complete. As a result, there is an inherent delay in predicting subsequent tokens in a sequence; for instance, the prediction of the second output token is delayed until the first has been fully generated.

Updated 2026-05-02

Contributors are: