1Cademy - Calculating Parameter Impact of FFN Expansion

Learn Before

FFN Hidden Size in Transformers

Short Answer

Calculating Parameter Impact of FFN Expansion

A model's architecture includes a feed-forward sub-layer composed of two linear transformations with an intermediate expansion. The model's main hidden size, $d$ , is 512. The intermediate layer's size, $d_{ffn}$ , is initially set to 2048. If an engineer increases this intermediate size to 3072 to improve model capacity, how many additional parameters are introduced into this specific sub-layer? (You can ignore bias terms in your calculation).

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related