1Cademy - Analyzing a Data Parallelism Implementation

Learn Before

Set of Distributed Data Batches in Data Parallelism

Case Study

Analyzing a Data Parallelism Implementation

A machine learning team is implementing a distributed training system with 4 parallel workers. For each training step, they start with a mini-batch of 1,000 unique data samples. Their distribution script assigns a random subset of 250 samples from the full mini-batch to each of the 4 workers. Because of the random sampling method used, it is possible for the same data sample to be assigned to multiple workers, and for some samples to not be assigned to any worker within the same training step.

Based on the principles of distributing data for parallel processing, identify the fundamental flaw in this team's data distribution strategy and explain why it will lead to an incorrect or inefficient model update for the mini-batch.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related