1Cademy - Performance Analysis of Sequence Processing Strategies

Learn Before

Reduced Prefilling Parallelism in Chunked Prefilling

Case Study

Performance Analysis of Sequence Processing Strategies

An engineering team is deploying a large language model to summarize long research papers. They are testing two methods for the initial processing of the input text. Method 1 processes the entire paper in a single, large forward pass. Method 2 breaks the paper into several smaller segments and processes them sequentially, one after the other. The team observes that while Method 2 successfully runs on their hardware with limited memory, it takes significantly longer to complete the initial processing for each paper than they projected Method 1 would, if it had enough memory. Based on the computational principles of sequence processing, what is the most likely reason for this increased processing time in Method 2?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related