Learn Before
Concept
Computing Gradients for Multi-Layer Neural Networks
For a neural network with only one weight layer and a sigmoid output, the gradient of the loss function can be computed directly using the derivative of the cross-entropy loss from logistic regression:
Similarly, for a network with one hidden layer and a softmax output, we can use the derivative of the softmax loss (as shown in the node image).
However, these direct derivatives only provide correct updates for the final weight layer. To compute the gradient for all previous hidden layers in a deeper network, we must use the error backpropagation (backprop) algorithm.

0
1
Updated 2026-06-15
Contributors are:
Who are from:
Tags
Data Science