Definition

FFN Hidden Size in Transformers

The Feed-Forward Network (FFN) sub-layers within Transformer models feature a hidden layer with a specific size denoted as dffnd_{\textrm{ffn}}. This dimension is typically designed to be larger than the standard hidden size, dd. A common architectural setup sets dffn=4dd_{\textrm{ffn}} = 4d. For more recent, larger-scale Transformers, dffnd_{\textrm{ffn}} can be assigned to an even larger value to boost capacity.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related