Learn Before
Formula
Square Root Learning Rate Scheduler
A simple learning rate scheduler can be defined to decay the learning rate proportionally to the inverse square root of the number of updates. Specifically, at step , the learning rate is set to , where is the initial learning rate. This formulation ensures that the step size gently and continuously decreases as training progresses.
0
1
Updated 2026-05-18
Tags
D2L
Dive into Deep Learning @ D2L
Related
Effect of Learning Rate Scheduling on Overfitting
Polynomial Learning Rate Decay
Piecewise Constant Learning Rate Schedule
Cosine Learning Rate Schedule
Optimizer Warmup
Factor Learning Rate Scheduler
Explicit Learning Rate Adjustment Implementation
Learning Rate Scheduler Toy Problem
Square Root Learning Rate Scheduler