1Cademy - A team is deploying a large language model to generate chapter-length summaries of scientific papers. They observe that the time required to generate a summary increases dramatically with the length of the input paper, and the process often fails due to out of memory errors on their hardware, even when processing one paper at a time. Which component of the models architecture is the most direct cause of this specific performance scaling issue?

Learn Before

Self-Attention as a Source of Inference Difficulty in Transformers

Multiple Choice

A team is deploying a large language model to generate chapter-length summaries of scientific papers. They observe that the time required to generate a summary increases dramatically with the length of the input paper, and the process often fails due to 'out of memory' errors on their hardware, even when processing one paper at a time. Which component of the model's architecture is the most direct cause of this specific performance scaling issue?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related