Concept

Parallelization on Multiple GPUs

When training deep neural networks on multiple GPUs, the computational workload and memory requirements must be distributed to achieve efficiency and overcome hardware limits. The three primary parallelization strategies are network partitioning (distributing subsequent layers across different GPUs), layerwise partitioning (splitting the operations within a single layer across multiple GPUs), and data parallelism (partitioning the training data across GPUs while maintaining a full model replica on each). By and large, data parallelism is the most convenient and widely used approach, provided the GPUs have sufficiently large memory to hold the model.

Image 0

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Learn After