1Cademy - Impact of Hidden Size on Sub-Layer Dimensions

Learn Before

Hidden Size in Transformer Models

Short Answer

Impact of Hidden Size on Sub-Layer Dimensions

A neural network architecture for language processing contains two main sub-layers that are repeated multiple times: a self-attention mechanism and a position-wise feed-forward network. Both of these sub-layers are designed to process vectors of a fixed dimension, known as the hidden size ( $d$ ). Analyze how this single hyperparameter, $d$ , determines the shape of the primary weight matrices within both the self-attention and the feed-forward network sub-layers.

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related