Code

Square Root Learning Rate Scheduler Implementation

A square root learning rate scheduler can be implemented from scratch as a custom class that acts as a callable function. When invoked with the current number of updates, it calculates and returns the decayed learning rate using the formula η0(t+1)12\eta_0 (t + 1)^{-\frac{1}{2}}. The following Python code demonstrates this implementation:

class SquareRootScheduler: def __init__(self, lr=0.1): self.lr = lr def __call__(self, num_update): return self.lr * pow(num_update + 1.0, -0.5)

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L