Learn Before
Concept

Gradient Reduction for Non-Scalar Outputs

While the Jacobian matrix provides the full derivative of a vector output, it is more common in machine learning to sum the gradients of each component of an output vector y\mathbf{y} with respect to the full input vector x\mathbf{x}. This reduction yields a gradient vector that has the exact same shape as x\mathbf{x}, a technique frequently used to aggregate gradients calculated individually for each training example in a batch.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L