Learn Before
Short Answer

Impact of Hidden Size on Sub-Layer Dimensions

A neural network architecture for language processing contains two main sub-layers that are repeated multiple times: a self-attention mechanism and a position-wise feed-forward network. Both of these sub-layers are designed to process vectors of a fixed dimension, known as the hidden size (dd). Analyze how this single hyperparameter, dd, determines the shape of the primary weight matrices within both the self-attention and the feed-forward network sub-layers.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science