Gaussian Error Linear Unit (GELU)
The Gaussian Error Linear Unit (GELU) is an activation function often used as an alternative to ReLU, particularly in Large Language Models. It can be conceptualized as a smoother variant of ReLU. Instead of gating inputs based on their sign, GELU weights an input value, , by its cumulative distribution function (CDF) percentile from a standard normal distribution, . This means the function scales its input based on the probability that a randomly drawn value from a standard Gaussian distribution is less than or equal to the input value.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Gaussian Error Linear Unit (GELU)
Gated Linear Unit (GLU)
A machine learning engineer is analyzing the feed-forward network (FFN) component of a transformer model. They want to replace the standard Rectified Linear Unit (ReLU) activation function with a more modern alternative to potentially improve model performance. Which of the following statements best analyzes the rationale for using a function like the Gaussian Error Linear Unit (GELU) or Swish over ReLU in this context?
Match each activation function, which can be used in the feed-forward network of a transformer model, with its corresponding description.
Evaluating an Activation Function Change in a Transformer FFN