Learn Before
Example
Learning Rate Warmup Training Example
To empirically observe the benefits of a learning rate warmup, a neural network can be trained using an optimizer configured with a warmup scheduler. The training metrics typically show that the network converges better initially—especially during the warmup epochs—compared to training without it. This improved early performance stabilizes the optimization process for advanced networks.
net = net_fn() trainer = torch.optim.SGD(net.parameters(), lr=0.3) train(net, train_iter, test_iter, num_epochs, loss, trainer, device, scheduler)
0
1
Updated 2026-05-18
Tags
D2L
Dive into Deep Learning @ D2L