Concept
Central Parameter Server Bottleneck
In multi-machine distributed training, utilizing a single central parameter server creates a significant bandwidth bottleneck. Because the network bandwidth per server is finite and comparatively low, all machines must communicate with this single central point to synchronize gradients and receive updated parameters. If there are worker machines, the time required to send all gradients to the central server scales linearly, resulting in an update time of . This bottleneck severely limits the scalability of synchronous distributed optimization, as the central server cannot efficiently handle the data transfer demands of many simultaneous workers.
0
1
Updated 2026-05-18
Tags
D2L
Dive into Deep Learning @ D2L