For a network with one weight layer and sigmoid output , we could simply use the derivative of the loss that we used for logistic regression:
$\frac{∂LCE(w,b)}{∂wj} = (σ(w· x+b)−y) xj$
Or for a network with one hidden layer and softmax output, we could use the derivative of the softmax loss: (the image)
But these derivatives only give correct updates for the last one weight layer.
The solution to computing this gradient is an algorithm called error backpropagation or backprop

University of Michigan - Ann Arbor

(1) initialize parameters

(2) Implement the forward propagation to get the prediction 

(3) compute the loss 

(4) Implement the backward propagation to compute the derivatives of the parameters 

(5) update the parameters 

(6) repeat steps (2) to (5) to reduce the cost.

The learning circle of the neural network

https://www.coursera.org/learn/neural-networks-deep-learning?specialization=deep-learning

Neural Networks and Deep Learning

Assume that only one class is the correct one and that there is one output unit in y for each class. y is a one-hot vector.
The cross-entropy loss is simply the log of the output probability corresponding to the correct class.

$LCE(yˆ, y) =\frac{−log exp(zi)}{sum(j=1,k, exp(zj))}$

Learn Before

Related