Learn Before
Concept

Swish Function

The Swish function, introduced by Ramachandran et al. in 2017, is a mathematical function defined as the product of its input and the sigmoid function applied to a scaled version of the input. For a scalar input xx, it is expressed as:

swish(x)=xsigmoid(βx)=x1+eβx \operatorname{swish}(x) = x \cdot \operatorname{sigmoid}(\beta x) = \frac{x}{1+e^{-\beta x}}

where β\beta is a constant or a trainable parameter. When applied element-wise to a vector h\mathbf{h} in neural networks, the formula is written as:

σswish(h)=hSigmoid(βh)\sigma_{\text{swish}}(\mathbf{h}) = \mathbf{h} \odot \text{Sigmoid}(\beta \mathbf{h})

where \odot denotes the element-wise product.

Image 0

0

2

Updated 2026-04-21

Tags

Data Science

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences