Example

Gradient of an Element-wise Product Example

When differentiating a vector-valued function, such as the element-wise product y=xxy = x * x, deep learning frameworks require the output to be reduced to a scalar to compute a gradient vector of the same shape as the input. For example, given the input vector x=[0,1,2,3]x = [0, 1, 2, 3], the element-wise product yields y=[0,1,4,9]y = [0, 1, 4, 9]. Reducing this output by summing its elements gives ixi2\sum_i x_i^2, the gradient of which is 2x2x, resulting in the vector [0,2,4,6][0, 2, 4, 6]. Frameworks handle this reduction differently: TensorFlow and MXNet implicitly sum the output vector, PyTorch requires passing a reduction vector of ones via the gradient argument (e.g., y.backward(gradient=torch.ones(len(y)))) or using an explicit sum, and JAX requires the function to explicitly return a scalar sum before applying the grad transform.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L