Formula

Adam Bias Correction

In the Adam optimizer, the state variables for momentum (vt\mathbf{v}_t) and the second moment (st\mathbf{s}_t) are typically initialized to zero (v0=s0=0\mathbf{v}_0 = \mathbf{s}_0 = 0). This initialization introduces a significant bias towards smaller values during the initial training steps. To correct this bias, Adam re-normalizes the terms using the sum of the weights i=0t1βi=1βt1β\sum_{i=0}^{t-1} \beta^i = \frac{1 - \beta^t}{1 - \beta}. The resulting debiased, or normalized, state variables are computed as v^t=vt1β1t\hat{\mathbf{v}}_t = \frac{\mathbf{v}_t}{1 - \beta_1^t} and s^t=st1β2t\hat{\mathbf{s}}_t = \frac{\mathbf{s}_t}{1 - \beta_2^t}.

0

1

Updated 2026-05-16

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Learn After