Learn Before
  • Gaussian Error Linear Unit (GELU)

GELU (Gaussian Error Linear Unit) Formula

The Gaussian Error Linear Unit (GELU) activation function is defined by the following formula, which is applied element-wise to an input vector h\mathbf{h}:

σgelu(h)=hPr(hh)\sigma_{\text{gelu}}(\mathbf{h}) = \mathbf{h} \text{Pr}(h \le \mathbf{h})

Here, hh is a random variable following the standard normal distribution, N(0,1)\mathcal{N}(0, 1). The term Pr(hh)\text{Pr}(h \le \mathbf{h}) is an informal notation representing the cumulative distribution function (CDF) of the standard normal distribution, commonly denoted by Φ(h)\Phi(\mathbf{h}). When applied to the input vector h\mathbf{h}, this term results in a new vector where each entry is the percentile (CDF value) corresponding to the respective entry in h\mathbf{h}. Therefore, the formula can be simplified to:

σgelu(h)=hΦ(h)\sigma_{\text{gelu}}(\mathbf{h}) = \mathbf{h} \Phi(\mathbf{h})

Image 0

0

1

7 days ago

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • GELU (Gaussian Error Linear Unit) Formula

  • Applications of GELU in Large Language Models