Learn Before
Optimizing Transformer Model Size
Evaluate the two strategies described in the case study. Which one is more likely to preserve the model's performance, and why? Justify your answer based on the role of the feed-forward network's intermediate layer.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team of engineers is designing a large neural network for a complex language task. Within each block of their model, they use a sub-network composed of two linear transformations with a non-linearity in between. They are debating whether to make the dimensionality of the intermediate layer in this sub-network significantly larger (e.g., four times larger) than the model's primary embedding and hidden state dimension. What is the primary trade-off they must consider when making this decision?
Optimizing Transformer Model Size
Calculating Parameter Impact of FFN Expansion