Short Answer

Analyzing the Impact of Chunk Size on Training Throughput

A machine learning team is training a large model using a pipelined approach across several processors. They run two experiments with the same total amount of data:

  • Experiment A: The data is divided into a very large number of extremely small chunks.
  • Experiment B: The data is divided into a moderate number of medium-sized chunks.

Which experiment is likely to achieve higher overall training throughput, and why? Explain the two opposing factors that create a trade-off when determining the optimal chunk size.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science