Learn Before
Analyzing Sub-Optimal Speed-up in Parallel Training
Based on the principles of parallel processing, analyze the potential reasons for the significant difference between the ideal, expected training time and the actual observed training time in this scenario.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning model takes 40 hours to train on a single processing unit. If the training process is parallelized across 8 identical processing units, what is the expected training time? Assume ideal conditions where the overhead from distributing data and coordinating the units is negligible.
Analyzing Sub-Optimal Speed-up in Parallel Training
In a data parallelism setup, doubling the number of workers will always result in halving the training time.