Learn Before
Concept

Layerwise Partitioning

Layerwise partitioning, commonly known as tensor parallelism, is a multiple-GPU training strategy that splits the computational work within individual network layers across different devices. For example, instead of computing all channels of a convolutional layer on a single GPU, the workload can be distributed so that multiple GPUs each compute a fraction of the channels. This approach scales effectively in terms of computation and enables the processing of larger networks by pooling the memory of multiple GPUs. However, because each layer's computation depends on the aggregated results from all other participating GPUs, this strategy requires a massive number of synchronization barriers and incurs exorbitant bandwidth costs for data transfer.

Image 0

0

1

Updated 2026-05-18

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

D2L

Dive into Deep Learning @ D2L