Learn Before
Formula

GeGLU (GELU-based Gated Linear Unit) Formula

The GeGLU (GELU-based Gated Linear Unit) activation function is defined by the following formula:

σgeglu(h)=σgelu(hW1+b1)(hW2+b2)\sigma_{\text{geglu}}(\mathbf{h}) = \sigma_{\text{gelu}}(\mathbf{hW}_1 + \mathbf{b}_1) \odot (\mathbf{hW}_2 + \mathbf{b}_2)

In this equation, h\mathbf{h} represents the input, while W1,W2,b1\mathbf{W}_1, \mathbf{W}_2, \mathbf{b}_1, and b2\mathbf{b}_2 are learnable model parameters (weights and biases). The function σgelu\sigma_{\text{gelu}} is the Gaussian Error Linear Unit (GELU) activation, and \odot signifies the element-wise product.

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences