1Cademy - Gradient of an Element-wise Product Example

Learn Before

Implicit Gradient Summation in Deep Learning Frameworks

Example

Gradient of an Element-wise Product Example

When differentiating a vector-valued function, such as the element-wise product $y = x * x$ , deep learning frameworks require the output to be reduced to a scalar to compute a gradient vector of the same shape as the input. For example, given the input vector $x = [0, 1, 2, 3]$ , the element-wise product yields $y = [0, 1, 4, 9]$ . Reducing this output by summing its elements gives $\sum_i x_i^2$ , the gradient of which is $2x$ , resulting in the vector $[0, 2, 4, 6]$ . Frameworks handle this reduction differently: TensorFlow and MXNet implicitly sum the output vector, PyTorch requires passing a reduction vector of ones via the gradient argument (e.g., y.backward(gradient=torch.ones(len(y)))) or using an explicit sum, and JAX requires the function to explicitly return a scalar sum before applying the grad transform.

Updated 2026-05-01

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related