1Cademy - Feed-Forward Network (FFN) Formula and Component Dimensions in Transformers

Learn Before

Purpose and Structure of the Feed-Forward Network (FFN) in Transformers

Formula

Feed-Forward Network (FFN) Formula and Component Dimensions in Transformers

In a Transformer architecture, the Feed-Forward Network (FFN) sub-layer is typically implemented as a two-layer network. The standard mathematical formulation for this FFN is:

$\mathrm{FFN}(\mathbf{h}) = \sigma(\mathbf{h} \mathbf{W}_h + \mathbf{b}_h) \mathbf{W}_f + \mathbf{b}_f$

Here, $\mathbf{h}$ is the input vector. The network's parameters consist of:

$\mathbf{W}_h \in \mathbb{R}^{d \times d_h}$ and $\mathbf{b}_h \in \mathbb{R}^{d_h}$ : The weight matrix and bias vector for the initial linear transformation.
$\mathbf{W}_f \in \mathbb{R}^{d_h \times d}$ and $\mathbf{b}_f \in \mathbb{R}^{d}$ : The weight matrix and bias vector for the subsequent linear transformation.

The dimension $d$ represents the input and output size, whereas $d_h$ indicates the hidden layer's size. The function $\sigma(\cdot)$ is the non-linear activation function utilized in the hidden layer, with the Rectified Linear Unit (ReLU) being a widespread choice.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After