Learn Before
Calculating Parameter Impact of FFN Expansion
A model's architecture includes a feed-forward sub-layer composed of two linear transformations with an intermediate expansion. The model's main hidden size, , is 512. The intermediate layer's size, , is initially set to 2048. If an engineer increases this intermediate size to 3072 to improve model capacity, how many additional parameters are introduced into this specific sub-layer? (You can ignore bias terms in your calculation).
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team of engineers is designing a large neural network for a complex language task. Within each block of their model, they use a sub-network composed of two linear transformations with a non-linearity in between. They are debating whether to make the dimensionality of the intermediate layer in this sub-network significantly larger (e.g., four times larger) than the model's primary embedding and hidden state dimension. What is the primary trade-off they must consider when making this decision?
Optimizing Transformer Model Size
Calculating Parameter Impact of FFN Expansion