Short Answer

Backward Pass Latency in Sequential Model Parallelism

A deep neural network is trained using a setup where consecutive layers are distributed across different workers. An engineer observes that during the backward pass, the worker holding the initial layers of the model is the last one to complete its computations for any given data batch. Based on the data flow of this process, explain why this observation is expected.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science