Learn Before
Example

Learning Rate Warmup Schedule Example

A learning rate warmup can be applied to various schedules, such as a cosine schedule, to improve initial convergence. In deep learning frameworks, this is often configured by setting a warmup_steps parameter. For example, a cosine scheduler can be configured to linearly increase the learning rate for the first 55 steps before applying the standard decay. Plotting the schedule over the epochs visually demonstrates this initial linear increase followed by the cooling down period.

scheduler = CosineScheduler(20, warmup_steps=5, base_lr=0.3, final_lr=0.01) d2l.plot(torch.arange(num_epochs), [scheduler(t) for t in range(num_epochs)])
Image 0

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L