1Cademy - Set of Distributed Data Batches in Data Parallelism

Learn Before

Data Parallelism

Definition

Set of Distributed Data Batches in Data Parallelism

In data parallelism, a minibatch of training sample, $\mathcal{D}_{\mathrm{mini}}$ , is divided into $N$ smaller batches, which can be denoted by $\mathcal{D}^{1},...,\mathcal{D}^{N}$ . After the division, these smaller batches are distributed to $N$ separate workers, each receiving one corresponding batch, allowing them to work at the same time.