1Cademy - Gaussian Error Linear Unit (GELU)

Learn Before

Activation function of the FFN in transformers
ReLU (Rectified Linear Unit)

Concept

Gaussian Error Linear Unit (GELU)

The Gaussian Error Linear Unit (GeLU) is a prominent alternative to the ReLU activation function in Large Language Models (LLMs), effectively acting as a smoothed version of it. Instead of gating outputs strictly by the sign of the input, the GeLU function operates by weighting its input using the percentile $\Pr(h \le \mathbf{h})$ . In this formulation, $h$ represents a $d$ -dimensional vector where each entry is sampled from the standard normal distribution, denoted as $\mathrm{Gaussian}(0,1)$ , producing a vector of percentiles corresponding to the elements of the input $\mathbf{h}$ .