Learn Before
Concept

(Batch) Gradient Descent (Deep Learning Optimization Algorithm)

Assuming that the error function is J(w)J(w) with one parameter ww, to minimize the error, we can update the weight ww as follows. w=wαdJ(w)dww = w - \alpha * \frac{dJ(w)}{dw} , where α\alpha is a learning rate, and dJ(w)dw\frac{dJ(w)}{dw} is the derivative of J(w)J(w) with respect to ww.

If the error function has two or more parameters, for example, a weight ww and a bias bb, we can update them one by one. w=wαJ(w,b)ww = w - \alpha * \frac{\partial J(w,b)}{\partial w} b=bαJ(w,b)bb = b - \alpha * \frac{\partial J(w,b)}{\partial b} , where \partial is a stylish cursive dd, denoting the partial derivatives.

Image 0

0

2

Updated 2021-11-19

Tags

Data Science