Concept

Computing Gradients for Multi-Layer Neural Networks

For a neural network with only one weight layer and a sigmoid output, the gradient of the loss function can be computed directly using the derivative of the cross-entropy loss from logistic regression:

LCE(w,b)wj=(σ(wx+b)y)xj\frac{\partial L_{CE}(w,b)}{\partial w_j} = (\sigma(w \cdot x + b) - y) x_j

Similarly, for a network with one hidden layer and a softmax output, we can use the derivative of the softmax loss (as shown in the node image).

However, these direct derivatives only provide correct updates for the final weight layer. To compute the gradient for all previous hidden layers in a deeper network, we must use the error backpropagation (backprop) algorithm.

Image 0

0

1

Updated 2026-06-15

Tags

Data Science

Learn After