Learn Before
Concept
Gradient Reduction for Non-Scalar Outputs
While the Jacobian matrix provides the full derivative of a vector output, it is more common in machine learning to sum the gradients of each component of an output vector with respect to the full input vector . This reduction yields a gradient vector that has the exact same shape as , a technique frequently used to aggregate gradients calculated individually for each training example in a batch.
0
1
Updated 2026-05-02
Tags
D2L
Dive into Deep Learning @ D2L