1Cademy - Why use a non-Linear Activation function

Learn Before

Linear vs. Non-Linear Activation Functions

Relation

Why use a non-Linear Activation function

If only linear activation functions are used in a neural network then the network is no more expressive than standard logistic regression without any hidden layers, because then all that is being done is linear transformations of the input. Thus, the network can not be more expressive than linear regression. It's by adding nonlinearity that we are able to approximate other kinds of more complex functions. If $\sigma(z^{[1]}) = z^{[1]}$ : $a^{[1]} = z^{[1]} = W^{[1]} x + b^{[1]}$ $a^{[2]} = z^{[2]} = W^{[2]} a^{[1]} + b^{[2]}$ $\Rightarrow a^{[2]} = W^{[2]} (W^{[1]} x + b^{[1]}) + b^{[2]}$ $= W^{[2]} W^{[1]} x + W^{[2]} b^{[1]} + b^{[2]} = W' x + b'$