Learn Before
Relation

Relationship between Swish Function and other Activation Functions

For β=1, the function becomes equivalent to the Sigmoid-weighted Linear Unit (SiL) function used in reinforcement learning, whereas for β=0, the functions turns into the scaled linear function f(x)=x/2. With β→∞, the sigmoid component approaches a 0-1 function, so swish becomes like the ReLU function. Thus, it can be viewed as a smoothing function which nonlinearly interpolates between a linear and the ReLU function.

0

1

Updated 2020-06-25

References


Tags

Data Science