A development team is building a language model based on the standard Transformer architecture to summarize lengthy legal documents, often exceeding 10,000 tokens. They observe that the model's memory usage grows quadratically with the input length, leading to out-of-memory errors. Which of the following architectural modifications most directly targets the root cause of this specific memory issue?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
LLM Architecture Selection for a Legal Tech Application
A development team is building a language model based on the standard Transformer architecture to summarize lengthy legal documents, often exceeding 10,000 tokens. They observe that the model's memory usage grows quadratically with the input length, leading to out-of-memory errors. Which of the following architectural modifications most directly targets the root cause of this specific memory issue?
Diagnosing LLM Performance Bottlenecks