Learn Before
Formula
Adam Bias Correction
In the Adam optimizer, the state variables for momentum () and the second moment () are typically initialized to zero (). This initialization introduces a significant bias towards smaller values during the initial training steps. To correct this bias, Adam re-normalizes the terms using the sum of the weights . The resulting debiased, or normalized, state variables are computed as and .
0
1
Updated 2026-05-16
Tags
D2L
Dive into Deep Learning @ D2L