1Cademy - Swish Function

Learn Before

Non-Linear Activation Functions

Concept

Swish Function

The Swish function, introduced by Ramachandran et al. in 2017, is a mathematical function defined as the product of its input and the sigmoid function applied to a scaled version of the input. For a scalar input $x$ , it is expressed as:

$\operatorname{swish}(x) = x \cdot \operatorname{sigmoid}(\beta x) = \frac{x}{1+e^{-\beta x}}$

where $\beta$ is a constant or a trainable parameter. When applied element-wise to a vector $\mathbf{h}$ in neural networks, the formula is written as:

$\sigma_{\text{swish}}(\mathbf{h}) = \mathbf{h} \odot \text{Sigmoid}(\beta \mathbf{h})$