Learn Before
Concept
Implicit Gradient Summation in Deep Learning Frameworks
When differentiating non-scalar outputs, deep learning frameworks handle gradient reduction differently. TensorFlow and MXNet implicitly reduce the output tensor to a scalar by summing all elements of the output vector , calculating the gradient of that sum to return instead of the full Jacobian matrix . In contrast, JAX does not perform implicit summation; its grad function is strictly defined for scalar outputs, meaning the user must explicitly construct a scalar-valued function (such as calling .sum()) prior to applying the gradient operation.
0
1
Updated 2026-05-02
Tags
D2L
Dive into Deep Learning @ D2L