Gradient of an Element-wise Product Example
When differentiating a vector-valued function, such as the element-wise product , deep learning frameworks require the output to be reduced to a scalar to compute a gradient vector of the same shape as the input. For example, given the input vector , the element-wise product yields . Reducing this output by summing its elements gives , the gradient of which is , resulting in the vector . Frameworks handle this reduction differently: TensorFlow and MXNet implicitly sum the output vector, PyTorch requires passing a reduction vector of ones via the gradient argument (e.g., y.backward(gradient=torch.ones(len(y)))) or using an explicit sum, and JAX requires the function to explicitly return a scalar sum before applying the grad transform.
0
1
Tags
D2L
Dive into Deep Learning @ D2L