Learn Before
ReLU (Rectified Linear Unit)
Pros and Cons of ReLU
Pros:
- Computationally efficient—allows the network to converge very quickly
- Non-linear—although it looks like a linear function, ReLU has a derivative function and allows for backpropagation.
- If you're not sure what activation function to use for the hidden layers, it's better to use ReLU by default.
- Jeffery Hinton: Allows neuron to express a strong opinion
- Gradient doesn't saturate (on the high end)
- Less sensitive to random initialization
- Runs great on low precision hardware
Cons:
- The Dying ReLU problem (Dead neuron): when inputs approach negative values, the gradient of the function becomes zero, the network cannot perform backpropagation and cannot learn. => Solution: Leaky ReLU
- Gradient discontinuous at origin: when inputs equal to 0, there is no derivative since it's in the intersection of a horizontal line and a linear line. So learning is not happening there. => Solution: GELU
0
3
4 years ago
Contributors are:
Who are from:
Tags
Data Science
Related
Pros and Cons of ReLU
Leaky ReLU
Parametric ReLU
Derivative of ReLU (Rectified Linear Unit) function
Learn After
Why is it better to use ReLU by default?