Learn Before
A machine learning engineer is designing a neural network for a complex language task and decides to significantly increase the dimensionality of the vectors that are processed within the network's internal sub-layers. What is the most direct trade-off the engineer should expect from this change?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning engineer is designing a neural network for a complex language task and decides to significantly increase the dimensionality of the vectors that are processed within the network's internal sub-layers. What is the most direct trade-off the engineer should expect from this change?
Impact of Hidden Size on Sub-Layer Dimensions
In a standard Transformer model's architecture, various components have specific dimensionalities defined by key hyperparameters. Match each component listed below with its correct dimensionality, using the following notation: represents the hidden size, is the size of the feed-forward network's inner layer, and is the number of attention heads.