Concept

Decomposition of NVLink Networks into Rings

To optimally synchronize data across GPUs interconnected via NVLink, the network connectivity can be decomposed into distinct ring structures. For example, an 88-GPU NVLink network can be organized into two separate rings: one ring utilizing double NVLink bandwidth and a second ring using regular bandwidth. This decomposition strategy avoids the bottleneck of the PCIe bus by allowing data to be synchronized directly between GPUs, maximizing the utilization of the high-speed aggregate bandwidth provided by the NVLink connections.

Image 0

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L