Google

In data parallelism, a minibatch of training sample, $$\mathcal{D}_{\mathrm{mini}}$$, is divided into $$N$$ smaller batches, which can be denoted by $$\mathcal{D}^{1},...,\mathcal{D}^{N}$$. After the division, these smaller batches are distributed to $$N$$ separate workers, each receiving one corresponding batch, allowing them to work at the same time.

Set of Distributed Data Batches in Data Parallelism

A training algorithm processes a large mini-batch of 512 data samples by distributing the workload across 8 parallel workers. Each worker has a complete copy of the model. How is the data from this single large mini-batch handled by the system for one computation step?

A machine learning team is implementing a distributed training system with 4 parallel workers. For each training step, they start with a mini-batch of 1,000 unique data samples. Their distribution script assigns a random subset of 250 samples from the full mini-batch to each of the 4 workers. Because of the random sampling method used, it is possible for the same data sample to be assigned to multiple workers, and for some samples to not be assigned to any worker within the same training step. 

Based on the principles of distributing data for parallel processing, identify the fundamental flaw in this team's data distribution strategy and explain why it will lead to an incorrect or inefficient model update for the mini-batch.

Analyzing a Data Parallelism Implementation

A training process uses data parallelism with 4 workers to process a mini-batch containing 1000 data samples. Describe the set of data batches that will be created and distributed for a single concurrent computation step. Your description should include the total number of batches created from the mini-batch and the number of data samples within each of those batches.

Learn Before

Related