Learn Before
ReLU (Rectified Linear Unit)
Pros and Cons of ReLU
Pros:
- Computationally efficient—allows the network to converge very quickly
- Non-linear—although it looks like a linear function, ReLU has a derivative function and allows for backpropagation.
- If you're not sure what activation function to use for the hidden layers, it's better to use ReLU by default.
- Jeffery Hinton: Allows neuron to express a strong opinion
- Gradient doesn't saturate (on the high end)
- Less sensitive to random initialization
- Runs great on low precision hardware
Cons:
- The Dying ReLU problem (Dead neuron): when inputs approach negative values, the gradient of the function becomes zero, the network cannot perform backpropagation and cannot learn. => Solution: Leaky ReLU
- Gradient discontinuous at origin: when inputs equal to 0, there is no derivative since it's in the intersection of a horizontal line and a linear line. So learning is not happening there. => Solution: GELU
0
3
Contributors are:
Who are from:
Tags
Data Science
Related
Pros and Cons of ReLU
Leaky ReLU
Parametric ReLU
Derivative of ReLU (Rectified Linear Unit) function
A common non-linear activation function is defined by the operation
f(x) = max(0, x). If this function is applied element-wise to the input vectorh = [2.7, -1.3, 0, -4.5, 8.1], what is the resulting output vector?A neuron in a neural network computes a pre-activation value (the weighted sum of its inputs plus bias) of -2.8. The neuron then applies an activation function defined by the formula
f(z) = max(0, z). Based on this, what will be the neuron's output, and what is the direct consequence for this neuron's learning process during backpropagation for this specific input?A hidden layer in a neural network produces the following vector of pre-activation values for a single neuron across five different training examples:
[-3.1, -0.5, 0.8, 2.4, 5.0]. An activation function defined asf(x) = max(0, x)is then applied to this vector. Which statement best analyzes the effect of this function on the information passed to the next layer?
Learn After
Why is it better to use ReLU by default?