Concept

Gaussian Error Linear Unit (GELU)

The Gaussian Error Linear Unit (GeLU) is a prominent alternative to the ReLU activation function in Large Language Models (LLMs), effectively acting as a smoothed version of it. Instead of gating outputs strictly by the sign of the input, the GeLU function operates by weighting its input using the percentile Pr(hh)\Pr(h \le \mathbf{h}). In this formulation, hh represents a dd-dimensional vector where each entry is sampled from the standard normal distribution, denoted as Gaussian(0,1)\mathrm{Gaussian}(0,1), producing a vector of percentiles corresponding to the elements of the input h\mathbf{h}.

Image 0

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
Learn After