Concept

Logistic regression gradient descent

If we only have two features, x1x_1 and x2x_2, in order to minimize the loss function, we can apply gradient descent to update w1w_1, w2w_2, and bb. To compute the derivatives of L(a,y)\mathcal L (a, y) with respect to w1w_1, w2w_2, and bb, we need to compute the derivatives of L(a,y)\mathcal L (a, y) with respect to aa and zz first. L(a,y)=(ylog(a)+(1y)log(1a))\mathcal L (a, y) = -(ylog(a) + (1 - y)log(1 - a)) \Rightarrow dL(a,y)da=ya+1y1a\frac{d\mathcal L (a, y)}{da} = -\frac{y}{a}+\frac{1-y}{1-a} a=σ(z)=11+ezdadz=a(1a)a = \sigma(z) = \frac{1}{1 + e^{-z}} \Rightarrow \frac{da}{dz} = a(1-a) \Rightarrow

\frac{d\mathcal L (a, y)}{dz} & = \frac{d\mathcal L (a, y)}{da}\frac{da}{dz} \\ & = (-\frac{y}{a}+\frac{1-y}{1-a})*(a(1-a)) = a-y \\ \end{aligned}$$ $$\begin{aligned} \frac{d\mathcal L (a, y)}{dw_1} & = \frac{d\mathcal L (a, y)}{dz}\frac{dz}{dw_1} = (a-y)*x_1 \\ \end{aligned}$$ $$\begin{aligned} \frac{d\mathcal L (a, y)}{dw_2} & = \frac{d\mathcal L (a, y)}{dz}\frac{dz}{dw_2} = (a-y)*x_2 \\ \end{aligned}$$ $$\begin{aligned} \frac{d\mathcal L (a, y)}{db} & = \frac{d\mathcal L (a, y)}{dz}\frac{dz}{db} = (a-y)*1 = (a-y) \end{aligned}$$
Image 0

0

1

Updated 2021-11-02

Tags

Data Science

Related