Learn Before
Short Answer

Calculating Parameter Impact of FFN Expansion

A model's architecture includes a feed-forward sub-layer composed of two linear transformations with an intermediate expansion. The model's main hidden size, dd, is 512. The intermediate layer's size, dffnd_{ffn}, is initially set to 2048. If an engineer increases this intermediate size to 3072 to improve model capacity, how many additional parameters are introduced into this specific sub-layer? (You can ignore bias terms in your calculation).

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science