Optimizer Warmup
To address the dilemma of choosing an initial learning rate that is either too small (causing slow progress) or too large (causing divergence), a simple strategy called optimizer warmup is used. During a warmup period, the learning rate gradually increases—typically linearly—from a small value to its initial maximum, after which it cools down until the end of the optimization process.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Effect of Learning Rate Scheduling on Overfitting
Polynomial Learning Rate Decay
Piecewise Constant Learning Rate Schedule
Cosine Learning Rate Schedule
Optimizer Warmup
Factor Learning Rate Scheduler
Explicit Learning Rate Adjustment Implementation
Learning Rate Scheduler Toy Problem
Square Root Learning Rate Scheduler
Optimizer Warmup