Learn Before
Concept
Scaling with Multiple Parameter Servers
To overcome the bandwidth bottleneck of a single central parameter server in multi-machine training, the system can be scaled by distributing the parameter synchronization workload across multiple servers. By increasing the number of parameter servers to , each server is responsible for storing and updating only a fraction of the parameters, specifically . Consequently, the total time required for parameter updates and optimization across workers is reduced to . In practice, to achieve constant scaling time regardless of the total number of workers, systems often use the exact same machines simultaneously as both workers and servers.
0
1
Updated 2026-05-18
Tags
D2L
Dive into Deep Learning @ D2L