Relation

Why use a non-Linear Activation function

If only linear activation functions are used in a neural network then the network is no more expressive than standard logistic regression without any hidden layers, because then all that is being done is linear transformations of the input. Thus, the network can not be more expressive than linear regression. It's by adding nonlinearity that we are able to approximate other kinds of more complex functions. If σ(z[1])=z[1]\sigma(z^{[1]}) = z^{[1]}: a[1]=z[1]=W[1]x+b[1]a^{[1]} = z^{[1]} = W^{[1]} x + b^{[1]} a[2]=z[2]=W[2]a[1]+b[2]a^{[2]} = z^{[2]} = W^{[2]} a^{[1]} + b^{[2]} a[2]=W[2](W[1]x+b[1])+b[2]\Rightarrow a^{[2]} = W^{[2]} (W^{[1]} x + b^{[1]}) + b^{[2]} =W[2]W[1]x+W[2]b[1]+b[2]=Wx+b= W^{[2]} W^{[1]} x + W^{[2]} b^{[1]} + b^{[2]} = W' x + b'

Image 0

0

1

Updated 2021-03-12

Tags

Data Science