Learn Before
  • ReLU (Rectified Linear Unit)

Pros and Cons of ReLU

Pros:

  • Computationally efficient—allows the network to converge very quickly
  • Non-linear—although it looks like a linear function, ReLU has a derivative function and allows for backpropagation.
  • If you're not sure what activation function to use for the hidden layers, it's better to use ReLU by default.
  • Jeffery Hinton: Allows neuron to express a strong opinion
  • Gradient doesn't saturate (on the high end)
  • Less sensitive to random initialization
  • Runs great on low precision hardware

Cons:

  • The Dying ReLU problem (Dead neuron): when inputs approach negative values, the gradient of the function becomes zero, the network cannot perform backpropagation and cannot learn. => Solution: Leaky ReLU
  • Gradient discontinuous at origin: when inputs equal to 0, there is no derivative since it's in the intersection of a horizontal line and a linear line. So learning is not happening there. => Solution: GELU

0

3

4 years ago

Tags

Data Science

Related
  • Pros and Cons of ReLU

  • Leaky ReLU

  • Parametric ReLU

  • Derivative of ReLU (Rectified Linear Unit) function

Learn After
  • Why is it better to use ReLU by default?