Case Study

Analyzing a Data Parallelism Implementation

A machine learning team is implementing a distributed training system with 4 parallel workers. For each training step, they start with a mini-batch of 1,000 unique data samples. Their distribution script assigns a random subset of 250 samples from the full mini-batch to each of the 4 workers. Because of the random sampling method used, it is possible for the same data sample to be assigned to multiple workers, and for some samples to not be assigned to any worker within the same training step.

Based on the principles of distributing data for parallel processing, identify the fundamental flaw in this team's data distribution strategy and explain why it will lead to an incorrect or inefficient model update for the mini-batch.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science