1Cademy - Importance of Activation Function Design in Wide FFNs

Learn Before

Feed-Forward Network (FFN) Formula and Component Dimensions in Transformers

Relation

Importance of Activation Function Design in Wide FFNs

In the practical implementation of Large Language Models (LLMs), increasing the hidden size parameter, denoted as $d_h$ , is generally beneficial for performance. However, deploying and training models with a very large hidden size introduces significant computational challenges. Because of these constraints, the careful design and selection of the activation function play a relatively more critical role in the effectiveness of such wide Feed-Forward Networks (FFNs).