Code

Multi-GPU Training Loop Implementation

The train function orchestrates the full multi-GPU data-parallel training loop. It accepts the number of GPUs, batch size, and learning rate as arguments. The procedure is:

  1. Setup: Load the Fashion-MNIST dataset, allocate the specified number of GPU devices using d2l.try_gpu(i), and replicate the model parameters to each GPU via get_params.
  2. Epoch loop: For each of 1010 epochs, iterate over every minibatch in the training set and call train_batch to perform the data-parallel forward pass, gradient synchronization, and parameter update. A synchronization barrier (torch.cuda.synchronize()) is called after every minibatch to ensure all GPU operations complete before the next minibatch begins, which is also necessary for accurate epoch timing.
  3. Evaluation: After each epoch, test accuracy is computed on a single GPU (GPU 0) using d2l.evaluate_accuracy_gpu, passing a lambda that wraps the model with the first device's parameters. Although evaluating on only one GPU leaves the others idle, it simplifies the code.
def train(num_gpus, batch_size, lr): train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size) devices = [d2l.try_gpu(i) for i in range(num_gpus)] device_params = [get_params(params, d) for d in devices] num_epochs = 10 animator = d2l.Animator('epoch', 'test acc', xlim=[1, num_epochs]) timer = d2l.Timer() for epoch in range(num_epochs): timer.start() for X, y in train_iter: train_batch(X, y, device_params, devices, lr) torch.cuda.synchronize() timer.stop() animator.add(epoch + 1, (d2l.evaluate_accuracy_gpu( lambda x: lenet(x, device_params[0]), test_iter, devices[0]),)) print(f'test acc: {animator.Y[0][-1]:.2f}, ' f'{timer.avg():.1f} sec/epoch on {str(devices)}')

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L