Learn Before
Latency from Sequential Dependency in Autoregressive Generation
In autoregressive models, the generation of each token is causally dependent on all previously generated tokens. This sequential dependency means that the computation for a given token cannot begin until the computation for the preceding token is complete. As a result, there is an inherent delay in predicting subsequent tokens in a sequence; for instance, the prediction of the second output token is delayed until the first has been fully generated.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Latency from Sequential Dependency in Autoregressive Generation
An autoregressive model is in the process of generating the four-token sequence:
A B C D. At the specific step where it is predicting tokenD, what information serves as the context for this prediction?Feasibility of Parallel Token Generation
An autoregressive model is tasked with generating the three-token sequence 'The cat sat'. Arrange the following computational steps in the correct chronological order.
Learn After
Evaluating a Performance Optimization Strategy
A team is comparing two text generation systems to produce a 10-token sequence.
- System A generates tokens one after another. The computation for each token takes 100ms.
- System B is a hypothetical system that can compute all 10 tokens simultaneously, with each token's computation also taking 100ms.
Why does System A take approximately 10 times longer than System B to produce the full sequence?
True or False: For an autoregressive text generation model, doubling the number of parallel processing units available for computation will cut the total time required to generate a 100-token sequence in half.