Relation

Computing the Gradient

For a network with one weight layer and sigmoid output , we could simply use the derivative of the loss that we used for logistic regression: LCE(w,b)wj=(σ(wx+b)y)xj\frac{∂LCE(w,b)}{∂wj} = (σ(w· x+b)−y) xj Or for a network with one hidden layer and softmax output, we could use the derivative of the softmax loss: (the image) But these derivatives only give correct updates for the last one weight layer. The solution to computing this gradient is an algorithm called error backpropagation or backprop

Image 0

0

1

Updated 2021-11-04

Tags

Data Science