1Cademy - A researcher modifies the position-wise sub-layer within a sequence processing model. The standard design for this sub-layer is a sequence of: a linear transformation, a non-linear activation, and a second linear transformation. The researchers modification adds a second non-linear activation function immediately after the final linear transformation. Which of the following best evaluates the impact of this architectural change?

Learn Before

Purpose and Structure of the Feed-Forward Network (FFN) in Transformers

Multiple Choice

A researcher modifies the position-wise sub-layer within a sequence processing model. The standard design for this sub-layer is a sequence of: a linear transformation, a non-linear activation, and a second linear transformation. The researcher's modification adds a second non-linear activation function immediately after the final linear transformation. Which of the following best evaluates the impact of this architectural change?

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related