1Cademy - Derivative of Softmax Cross-Entropy Loss with Respect to Logits

Learn Before

Softmax Function

Formula

Derivative of Softmax Cross-Entropy Loss with Respect to Logits

The derivative of the softmax cross-entropy loss with respect to any unnormalized logit $o_j$ reveals an elegant and intuitive result:

$\partial_{o_j} l(\mathbf{y}, \hat{\mathbf{y}}) = \frac{\exp(o_j)}{\sum_{k=1}^q \exp(o_k)} - y_j = \mathrm{softmax}(\mathbf{o})_j - y_j$

This derivative is exactly the difference between the conditional probability assigned to the class by the model's softmax operation and the actual observation recorded in the one-hot label vector $\mathbf{y}$ . This elegant property is characteristic of any exponential family model and makes computing gradients for backpropagation extremely straightforward.

Updated 2026-05-03

Contributors are: