Learn Before
Formula

Square Root Learning Rate Scheduler

A simple learning rate scheduler can be defined to decay the learning rate proportionally to the inverse square root of the number of updates. Specifically, at step tt, the learning rate is set to η=η0(t+1)12\eta = \eta_0 (t + 1)^{-\frac{1}{2}}, where η0\eta_0 is the initial learning rate. This formulation ensures that the step size gently and continuously decreases as training progresses.

Image 0

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L