Learn Before
An engineer is deploying a large language model for a task that requires processing very long sequences of text. During testing, they observe that the system's memory usage grows linearly with the length of the input sequence, eventually causing the system to run out of memory and fail. Which of the following strategies correctly identifies the underlying trade-off to mitigate this specific memory issue?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Chunked and Windowed Attention
An engineer is deploying a large language model for a task that requires processing very long sequences of text. During testing, they observe that the system's memory usage grows linearly with the length of the input sequence, eventually causing the system to run out of memory and fail. Which of the following strategies correctly identifies the underlying trade-off to mitigate this specific memory issue?
Optimizing a Document Summarization Service
Memory-Compute Trade-off in Constrained Environments