Learn Before
Code

Cosine Learning Rate Scheduler Implementation

A cosine learning rate scheduler can be implemented from scratch as a custom Python class. The class calculates the decayed learning rate based on the current step according to a cosine curve. It optionally includes a warmup phase where the learning rate increases linearly. Once past the warmup steps but within the maximum update steps, the learning rate is determined by the formula ηt=ηT+η0ηT2(1+cos(π(ttwarmup)Tmax_steps))\eta_t = \eta_T + \frac{\eta_0 - \eta_T}{2} \left(1 + \cos\left(\frac{\pi (t - t_{\text{warmup}})}{T_{\text{max\_steps}}}\right)\right). The following code demonstrates this implementation and plots the resulting schedule:

class CosineScheduler: def __init__(self, max_update, base_lr=0.01, final_lr=0, warmup_steps=0, warmup_begin_lr=0): self.base_lr_orig = base_lr self.max_update = max_update self.final_lr = final_lr self.warmup_steps = warmup_steps self.warmup_begin_lr = warmup_begin_lr self.max_steps = self.max_update - self.warmup_steps def get_warmup_lr(self, epoch): increase = (self.base_lr_orig - self.warmup_begin_lr) \ * float(epoch) / float(self.warmup_steps) return self.warmup_begin_lr + increase def __call__(self, epoch): if epoch < self.warmup_steps: return self.get_warmup_lr(epoch) if epoch <= self.max_update: self.base_lr = self.final_lr + ( self.base_lr_orig - self.final_lr) * (1 + math.cos( math.pi * (epoch - self.warmup_steps) / self.max_steps)) / 2 return self.base_lr scheduler = CosineScheduler(max_update=20, base_lr=0.3, final_lr=0.01) d2l.plot(torch.arange(num_epochs), [scheduler(t) for t in range(num_epochs)])
Image 0

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L