A linear activation function has two major problems:

1. Not possible to use backpropagation  (gradient descent) to train the model—the derivative of the function is a constant, and has no relation to the input, X. So it’s not possible to go back and understand which weights in the input neurons can provide a better prediction.

2. All layers of the neural network collapse into one—with linear activation functions, no matter how many layers in the neural network, the last layer will be a linear function of the first layer (because a linear combination of linear functions is still a linear function). So a linear activation function turns the neural network into just one layer.

A neural network with a linear activation function is simply a linear regression model. It has limited power and ability to handle complexity varying parameters of input data.

University of Michigan - Ann Arbor

A linear activation function takes the form: $y = cx$.

It takes the inputs, multiplied by the weights for each neuron, and creates an output signal proportional to the input. In one sense, a linear function is better than a step function because it allows multiple outputs, not just yes and no.


Linear Activation Function

A helpful website that introduces neural networks:
https://missinglink.ai/guides/neural-network-concepts/

Neural Network Reference

Problems with Linear Activation Function

Non-linear functions address the problems of a linear activation function:

 - They allow backpropagation because they have a derivative function which is related to the inputs.
 - They allow “stacking” of multiple layers of neurons to create a deep neural network. Multiple hidden layers of neurons are needed to learn complex data sets with high levels of accuracy.

Learn Before

Related