Concept

Implicit Gradient Summation in Deep Learning Frameworks

When differentiating non-scalar outputs, deep learning frameworks handle gradient reduction differently. TensorFlow and MXNet implicitly reduce the output tensor to a scalar by summing all elements of the output vector y\mathbf{y}, calculating the gradient of that sum to return xiyi\partial_{\mathbf{x}} \sum_i y_i instead of the full Jacobian matrix xy\partial_{\mathbf{x}} \mathbf{y}. In contrast, JAX does not perform implicit summation; its grad function is strictly defined for scalar outputs, meaning the user must explicitly construct a scalar-valued function (such as calling .sum()) prior to applying the gradient operation.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L