Formula

SwiGLU (Swish-based Gated Linear Unit) Formula

The SwiGLU function is mathematically defined by utilizing the Swish function, σswish\sigma_{\mathrm{swish}}, as its internal non-linear activation. The formula is expressed as:

σswiglu(h)=σswish(hW1+b1)(W2+b2)\sigma_{\mathrm{swiglu}}(\mathbf{h}) = \sigma_{\mathrm{swish}}(\mathbf{h}\mathbf{W}_1 + \mathbf{b}_1) \odot (\mathbf{W}_2 + \mathbf{b}_2)

where h\mathbf{h} is the input vector and \odot indicates the element-wise product.

Image 0

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences