Formula

Feed-Forward Network (FFN) Formula and Component Dimensions in Transformers

In a Transformer architecture, the Feed-Forward Network (FFN) sub-layer is typically implemented as a two-layer network. The standard mathematical formulation for this FFN is:

FFN(h)=σ(hWh+bh)Wf+bf\mathrm{FFN}(\mathbf{h}) = \sigma(\mathbf{h} \mathbf{W}_h + \mathbf{b}_h) \mathbf{W}_f + \mathbf{b}_f

Here, h\mathbf{h} is the input vector. The network's parameters consist of:

  • WhRd×dh\mathbf{W}_h \in \mathbb{R}^{d \times d_h} and bhRdh\mathbf{b}_h \in \mathbb{R}^{d_h}: The weight matrix and bias vector for the initial linear transformation.
  • WfRdh×d\mathbf{W}_f \in \mathbb{R}^{d_h \times d} and bfRd\mathbf{b}_f \in \mathbb{R}^{d}: The weight matrix and bias vector for the subsequent linear transformation.

The dimension dd represents the input and output size, whereas dhd_h indicates the hidden layer's size. The function σ()\sigma(\cdot) is the non-linear activation function utilized in the hidden layer, with the Rectified Linear Unit (ReLU) being a widespread choice.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Learn After