Learn Before
Concept

Scaling with Multiple Parameter Servers

To overcome the bandwidth bottleneck of a single central parameter server in multi-machine training, the system can be scaled by distributing the parameter synchronization workload across multiple servers. By increasing the number of parameter servers to nn, each server is responsible for storing and updating only a fraction of the parameters, specifically O(1/n)\mathcal{O}(1/n). Consequently, the total time required for parameter updates and optimization across mm workers is reduced to O(m/n)\mathcal{O}(m/n). In practice, to achieve constant scaling time regardless of the total number of workers, systems often use the exact same machines simultaneously as both workers and servers.

Image 0

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L