Learn Before
Concept

Ramachandran et al. [2017] on the Swish Function

The 2017 paper by Ramachandran et al. is the original source that introduced the Swish activation function. It defines the function by the formula σswish(h)=hSigmoid(ch)\sigma_{\text{swish}}(\mathbf{h}) = \mathbf{h} \odot \text{Sigmoid}(c\mathbf{h}), where cc is a constant or a trainable parameter.

0

1

Updated 2025-09-29

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences