1Cademy - A team of engineers is designing a large neural network for a complex language task. Within each block of their model, they use a sub-network composed of two linear transformations with a non-linearity in between. They are debating whether to make the dimensionality of the intermediate layer in this sub-network significantly larger (e.g., four times larger) than the models primary embedding and hidden state dimension. What is the primary trade-off they must consider when making this decision?

Learn Before

FFN Hidden Size in Transformers

Multiple Choice

A team of engineers is designing a large neural network for a complex language task. Within each block of their model, they use a sub-network composed of two linear transformations with a non-linearity in between. They are debating whether to make the dimensionality of the intermediate layer in this sub-network significantly larger (e.g., four times larger) than the model's primary embedding and hidden state dimension. What is the primary trade-off they must consider when making this decision?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related