Learn Before
Code
Multi-GPU Training Loop Implementation
The train function orchestrates the full multi-GPU data-parallel training loop. It accepts the number of GPUs, batch size, and learning rate as arguments. The procedure is:
- Setup: Load the Fashion-MNIST dataset, allocate the specified number of GPU devices using
d2l.try_gpu(i), and replicate the model parameters to each GPU viaget_params. - Epoch loop: For each of epochs, iterate over every minibatch in the training set and call
train_batchto perform the data-parallel forward pass, gradient synchronization, and parameter update. A synchronization barrier (torch.cuda.synchronize()) is called after every minibatch to ensure all GPU operations complete before the next minibatch begins, which is also necessary for accurate epoch timing. - Evaluation: After each epoch, test accuracy is computed on a single GPU (GPU 0) using
d2l.evaluate_accuracy_gpu, passing a lambda that wraps the model with the first device's parameters. Although evaluating on only one GPU leaves the others idle, it simplifies the code.
def train(num_gpus, batch_size, lr): train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size) devices = [d2l.try_gpu(i) for i in range(num_gpus)] device_params = [get_params(params, d) for d in devices] num_epochs = 10 animator = d2l.Animator('epoch', 'test acc', xlim=[1, num_epochs]) timer = d2l.Timer() for epoch in range(num_epochs): timer.start() for X, y in train_iter: train_batch(X, y, device_params, devices, lr) torch.cuda.synchronize() timer.stop() animator.add(epoch + 1, (d2l.evaluate_accuracy_gpu( lambda x: lenet(x, device_params[0]), test_iter, devices[0]),)) print(f'test acc: {animator.Y[0][-1]:.2f}, ' f'{timer.avg():.1f} sec/epoch on {str(devices)}')
0
1
Updated 2026-05-18
Tags
D2L
Dive into Deep Learning @ D2L