Learn Before
Formula

Derivative of Softmax Cross-Entropy Loss with Respect to Logits

The derivative of the softmax cross-entropy loss with respect to any unnormalized logit ojo_j reveals an elegant and intuitive result:

ojl(y,y^)=exp(oj)k=1qexp(ok)yj=softmax(o)jyj\partial_{o_j} l(\mathbf{y}, \hat{\mathbf{y}}) = \frac{\exp(o_j)}{\sum_{k=1}^q \exp(o_k)} - y_j = \mathrm{softmax}(\mathbf{o})_j - y_j

This derivative is exactly the difference between the conditional probability assigned to the class by the model's softmax operation and the actual observation recorded in the one-hot label vector y\mathbf{y}. This elegant property is characteristic of any exponential family model and makes computing gradients for backpropagation extremely straightforward.

0

2

Updated 2026-05-03

Tags

Data Science

D2L

Dive into Deep Learning @ D2L

Related