Computational Bottlenecks in Autoregressive Generation
During the process of generating text one token at a time, a large language model must repeatedly consult the information from all previously generated tokens. Explain two distinct computational challenges that arise specifically from the self-attention mechanism in this iterative process, particularly as the sequence of generated text grows longer.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is deploying a large language model to generate chapter-length summaries of scientific papers. They observe that the time required to generate a summary increases dramatically with the length of the input paper, and the process often fails due to 'out of memory' errors on their hardware, even when processing one paper at a time. Which component of the model's architecture is the most direct cause of this specific performance scaling issue?
Computational Bottlenecks in Autoregressive Generation
Diagnosing Performance Bottlenecks in Autoregressive Generation