Learn Before
Concept
Gradient Buffer Accumulation
In deep learning frameworks, the behavior of the gradient buffer after a backward pass varies. PyTorch accumulates newly calculated gradients by adding them to the existing values stored in the buffer. This accumulation is advantageous when optimizing the sum of multiple objective functions, but it requires the programmer to explicitly reset the gradients to zero before computing gradients for a new iteration. In contrast, frameworks like MXNet and TensorFlow automatically reset the gradient buffer whenever a new gradient is recorded, overwriting the previously stored values.
0
1
Updated 2026-05-02
Tags
D2L
Dive into Deep Learning @ D2L