Learn Before
Concept

Gradient Buffer Accumulation

In deep learning frameworks, the behavior of the gradient buffer after a backward pass varies. PyTorch accumulates newly calculated gradients by adding them to the existing values stored in the buffer. This accumulation is advantageous when optimizing the sum of multiple objective functions, but it requires the programmer to explicitly reset the gradients to zero before computing gradients for a new iteration. In contrast, frameworks like MXNet and TensorFlow automatically reset the gradient buffer whenever a new gradient is recorded, overwriting the previously stored values.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Related