Analyzing a Data Parallelism Implementation
A machine learning team is implementing a distributed training system with 4 parallel workers. For each training step, they start with a mini-batch of 1,000 unique data samples. Their distribution script assigns a random subset of 250 samples from the full mini-batch to each of the 4 workers. Because of the random sampling method used, it is possible for the same data sample to be assigned to multiple workers, and for some samples to not be assigned to any worker within the same training step.
Based on the principles of distributing data for parallel processing, identify the fundamental flaw in this team's data distribution strategy and explain why it will lead to an incorrect or inefficient model update for the mini-batch.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A training algorithm processes a large mini-batch of 512 data samples by distributing the workload across 8 parallel workers. Each worker has a complete copy of the model. How is the data from this single large mini-batch handled by the system for one computation step?
Analyzing a Data Parallelism Implementation
Data Distribution in Parallel Training