Data Distribution in Parallel Training
A training process uses data parallelism with 4 workers to process a mini-batch containing 1000 data samples. Describe the set of data batches that will be created and distributed for a single concurrent computation step. Your description should include the total number of batches created from the mini-batch and the number of data samples within each of those batches.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A training algorithm processes a large mini-batch of 512 data samples by distributing the workload across 8 parallel workers. Each worker has a complete copy of the model. How is the data from this single large mini-batch handled by the system for one computation step?
Analyzing a Data Parallelism Implementation
Data Distribution in Parallel Training