Performance Analysis of Sequence Processing Strategies
An engineering team is deploying a large language model to summarize long research papers. They are testing two methods for the initial processing of the input text. Method 1 processes the entire paper in a single, large forward pass. Method 2 breaks the paper into several smaller segments and processes them sequentially, one after the other. The team observes that while Method 2 successfully runs on their hardware with limited memory, it takes significantly longer to complete the initial processing for each paper than they projected Method 1 would, if it had enough memory. Based on the computational principles of sequence processing, what is the most likely reason for this increased processing time in Method 2?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A memory-optimization technique for processing long input sequences in a transformer model involves breaking the sequence into smaller segments and processing them sequentially, one after the other. In contrast, the standard method processes the entire sequence in a single, large computational step. Which statement best analyzes the primary performance trade-off of using the segmented, sequential approach?
Performance Analysis of Sequence Processing Strategies
Parallelism in Sequence Processing