Learn Before
Matching

In a standard Transformer model's architecture, various components have specific dimensionalities defined by key hyperparameters. Match each component listed below with its correct dimensionality, using the following notation: dd represents the hidden size, dffnd_{ffn} is the size of the feed-forward network's inner layer, and nheadn_{head} is the number of attention heads.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science