Backpropagation is a systematic computational procedure for applying the chain rule to calculate gradients automatically. It operates by traversing a computational graph in a backwards direction—from the output loss back to the input parameters—multiplying matrices of partial derivatives at each step to determine how parameters affect the final output.

Backpropagation

For a neural network with only one weight layer and a sigmoid output, the gradient of the loss function can be computed directly using the derivative of the cross-entropy loss from logistic regression:

$$\frac{\partial L_{CE}(w,b)}{\partial w_j} = (\sigma(w \cdot x + b) - y) x_j$$

Similarly, for a network with one hidden layer and a softmax output, we can use the derivative of the softmax loss (as shown in the node image).

However, these direct derivatives only provide correct updates for the final weight layer. To compute the gradient for all previous hidden layers in a deeper network, we must use the error backpropagation (backprop) algorithm.

University of Michigan - Ann Arbor

Google

(1) initialize parameters

(2) Implement the forward propagation to get the prediction 

(3) compute the loss 

(4) Implement the backward propagation to compute the derivatives of the parameters 

(5) update the parameters 

(6) repeat steps (2) to (5) to reduce the cost.

The learning circle of the neural network

https://www.coursera.org/learn/neural-networks-deep-learning?specialization=deep-learning

Neural Networks and Deep Learning

Assume that only one class is the correct one and that there is one output unit in y for each class. y is a one-hot vector.
The cross-entropy loss is simply the log of the output probability corresponding to the correct class.

$LCE(yˆ, y) =\frac{−log exp(zi)}{sum(j=1,k, exp(zj))}$

Learn Before

Related

Learn After