Learn Before
A 4-layer neural network is distributed across two workers using layer-wise model parallelism (Worker 1 holds layers 1-2, Worker 2 holds layers 3-4). Arrange the following events in the correct chronological order for a single training step, which includes one forward and one backward pass.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An 8-layer neural network is distributed across 4 workers, with each worker holding 2 consecutive layers (Worker 1 has layers 1-2, Worker 2 has layers 3-4, etc.). During the forward pass for a single data batch, what is the state of Worker 1 and Worker 4 at the exact moment Worker 3 is actively computing its layers (layers 5-6)?
A 4-layer neural network is distributed across two workers using layer-wise model parallelism (Worker 1 holds layers 1-2, Worker 2 holds layers 3-4). Arrange the following events in the correct chronological order for a single training step, which includes one forward and one backward pass.
Backward Pass Latency in Sequential Model Parallelism