1Cademy - In a standard Transformer models architecture, various components have specific dimensionalities defined by key hyperparameters. Match each component listed below with its correct dimensionality, using the following notation: $d$ represents the hidden size, $d_{ffn}$ is the size of the feed-forward networks inner layer, and $n_{head}$ is the number of attention heads.

Learn Before

Hidden Size in Transformer Models

Matching

In a standard Transformer model's architecture, various components have specific dimensionalities defined by key hyperparameters. Match each component listed below with its correct dimensionality, using the following notation: $d$ represents the hidden size, $d_{ffn}$ is the size of the feed-forward network's inner layer, and $n_{head}$ is the number of attention heads.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related