1Cademy - Analyzing the Impact of Chunk Size on Training Throughput

Learn Before

Trade-off of Micro-batch Size in Pipeline Parallelism

Short Answer

Analyzing the Impact of Chunk Size on Training Throughput

A machine learning team is training a large model using a pipelined approach across several processors. They run two experiments with the same total amount of data:

Experiment A: The data is divided into a very large number of extremely small chunks.
Experiment B: The data is divided into a moderate number of medium-sized chunks.

Which experiment is likely to achieve higher overall training throughput, and why? Explain the two opposing factors that create a trade-off when determining the optimal chunk size.

Updated 2025-10-06

Contributors are: