Learn Before
Relation

Pros and Cons of ReLU

Pros:

  • Computationally efficient—allows the network to converge very quickly
  • Non-linear—although it looks like a linear function, ReLU has a derivative function and allows for backpropagation.
  • If you're not sure what activation function to use for the hidden layers, it's better to use ReLU by default.
  • Jeffery Hinton: Allows neuron to express a strong opinion
  • Gradient doesn't saturate (on the high end)
  • Less sensitive to random initialization
  • Runs great on low precision hardware

Cons:

  • The Dying ReLU problem (Dead neuron): when inputs approach negative values, the gradient of the function becomes zero, the network cannot perform backpropagation and cannot learn. => Solution: Leaky ReLU
  • Gradient discontinuous at origin: when inputs equal to 0, there is no derivative since it's in the intersection of a horizontal line and a linear line. So learning is not happening there. => Solution: GELU

0

3

Updated 2021-11-11

Tags

Data Science