Simultaneous vs. Sequential Phases in Continuous and Standard Batching
A key difference between continuous and standard batching methods lies in how they execute the prefilling and decoding phases. In continuous batching, prefilling and decoding can occur simultaneously across different sequences within the active batch. Conversely, in standard batching, these two phases must be performed sequentially for the entire batch before moving on.
0
1
Tags
Foundations of Large Language Models
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Inference System Optimization
An AI development team is deploying two different services. Service X is a real-time conversational agent where minimizing the response time for each user's turn is the top priority. Service Y is an offline system that processes a massive queue of documents for analysis, where maximizing the total number of documents processed per day is the main goal. Considering the trade-offs between different batching methods, which approach is best suited for each service?
Match each batching strategy with its corresponding primary goal and performance trade-off.
Simultaneous vs. Sequential Phases in Continuous and Standard Batching