Learn Before
Parallelization on Multiple GPUs
When training deep neural networks on multiple GPUs, the computational workload and memory requirements must be distributed to achieve efficiency and overcome hardware limits. The three primary parallelization strategies are network partitioning (distributing subsequent layers across different GPUs), layerwise partitioning (splitting the operations within a single layer across multiple GPUs), and data parallelism (partitioning the training data across GPUs while maintaining a full model replica on each). By and large, data parallelism is the most convenient and widely used approach, provided the GPUs have sufficiently large memory to hold the model.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
CPU vs. GPU Architecture in Deep Learning
AlexNet Convolutional Neural Network
cuda-convnet
General-Purpose GPUs (GPGPUs)
GPU Hardware Configurations in Deep Learning
High-Bandwidth GPU Memory Technologies
Hardware Accelerators for Inference
Hardware Accelerators for Training
NVIDIA Collective Communications Library (NCCL)
Parallelization on Multiple GPUs