Example

Example of Throughput Gain with Increased Batch Size

An example illustrating the efficiency gains from batching compares processing four sequences with a batch size of one versus a batch size of four. When the batch size is one, each sequence is processed sequentially, with the system completing the prefilling and decoding for the first sequence before starting the second, and so on. In contrast, with a batch size of four, all four sequences are processed in parallel within a single computational pass. This parallel execution significantly increases throughput by making better use of the hardware's capacity, even though it requires padding shorter sequences to match the length of the longest one.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences