Learn Before
In a data parallelism setup, doubling the number of workers will always result in halving the training time.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning model takes 40 hours to train on a single processing unit. If the training process is parallelized across 8 identical processing units, what is the expected training time? Assume ideal conditions where the overhead from distributing data and coordinating the units is negligible.
Analyzing Sub-Optimal Speed-up in Parallel Training
In a data parallelism setup, doubling the number of workers will always result in halving the training time.