Learn Before
Impact of Hidden Size on Sub-Layer Dimensions
A neural network architecture for language processing contains two main sub-layers that are repeated multiple times: a self-attention mechanism and a position-wise feed-forward network. Both of these sub-layers are designed to process vectors of a fixed dimension, known as the hidden size (). Analyze how this single hyperparameter, , determines the shape of the primary weight matrices within both the self-attention and the feed-forward network sub-layers.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning engineer is designing a neural network for a complex language task and decides to significantly increase the dimensionality of the vectors that are processed within the network's internal sub-layers. What is the most direct trade-off the engineer should expect from this change?
Impact of Hidden Size on Sub-Layer Dimensions
In a standard Transformer model's architecture, various components have specific dimensionalities defined by key hyperparameters. Match each component listed below with its correct dimensionality, using the following notation: represents the hidden size, is the size of the feed-forward network's inner layer, and is the number of attention heads.