1Cademy - An engineer is designing a neural network for a large language model and observes that the two-layer Feed-Forward Network (FFN) component is the primary computational bottleneck during training. The design specifies that the FFNs internal hidden layer dimension must be significantly larger than its input and output dimensions to ensure high model capacity. Given the goal of reducing the computational cost of the FFN while preserving its expressive power, which of the following design choices for the non-linear activation function (applied after the first linear layer) would be most effective?

Learn Before

Importance of Activation Function Design in Wide FFNs

Multiple Choice

An engineer is designing a neural network for a large language model and observes that the two-layer Feed-Forward Network (FFN) component is the primary computational bottleneck during training. The design specifies that the FFN's internal hidden layer dimension must be significantly larger than its input and output dimensions to ensure high model capacity. Given the goal of reducing the computational cost of the FFN while preserving its expressive power, which of the following design choices for the non-linear activation function (applied after the first linear layer) would be most effective?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related