Multiple Choice

A development team is building a language model based on the standard Transformer architecture to summarize lengthy legal documents, often exceeding 10,000 tokens. They observe that the model's memory usage grows quadratically with the input length, leading to out-of-memory errors. Which of the following architectural modifications most directly targets the root cause of this specific memory issue?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science