Learn Before
Backward Pass Latency in Sequential Model Parallelism
A deep neural network is trained using a setup where consecutive layers are distributed across different workers. An engineer observes that during the backward pass, the worker holding the initial layers of the model is the last one to complete its computations for any given data batch. Based on the data flow of this process, explain why this observation is expected.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An 8-layer neural network is distributed across 4 workers, with each worker holding 2 consecutive layers (Worker 1 has layers 1-2, Worker 2 has layers 3-4, etc.). During the forward pass for a single data batch, what is the state of Worker 1 and Worker 4 at the exact moment Worker 3 is actively computing its layers (layers 5-6)?
A 4-layer neural network is distributed across two workers using layer-wise model parallelism (Worker 1 holds layers 1-2, Worker 2 holds layers 3-4). Arrange the following events in the correct chronological order for a single training step, which includes one forward and one backward pass.
Backward Pass Latency in Sequential Model Parallelism