Computational Constraints in Autoregressive Generation
An autoregressive language model generates text one token at a time, where each new token's probability depends on all the tokens that came before it. Based on this sequential dependency, explain the primary challenge this process poses for achieving high computational parallelism during the text generation (or 'inference') phase.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autoregressive language model is in the process of generating a response. It has so far produced the token sequence:
['The', 'quick', 'brown']. To determine the very next token, what is the primary probability distribution the model must compute?Evaluating Language Model Generation Strategies
Computational Constraints in Autoregressive Generation
To generate a sequence of text, the fundamental computational step for an autoregressive model is to calculate the joint probability of all potential future tokens at once, conditioned only on the initial prompt.