Learn Before
Concept

Cosine Learning Rate Schedule

A cosine learning rate schedule, proposed by Loshchilov and Hutter (2016), dynamically adjusts the learning rate by following the shape of a cosine curve. It relies on the observation that the learning rate should not decrease too drastically at the beginning of training, and that the solution should be refined at the end using a very small learning rate. For learning rates in the range t[0,T]t \in [0, T], this results in a schedule with the functional form:

ηt=ηT+η0ηT2(1+cos(πt/T))\eta_t = \eta_T + \frac{\eta_0 - \eta_T}{2} \left(1 + \cos(\pi t/T)\right)

Here, η0\eta_0 is the initial learning rate and ηT\eta_T is the target rate at the maximum update step TT. For steps t>Tt > T, the learning rate is simply pinned to ηT\eta_T without increasing it again.

Image 0

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L