1Cademy - In a standard two-layer feed-forward network (FFN) within a Transformer, an input vector `h` has a dimension of `d = 512`. The networks hidden layer has a dimension of `d_h = 2048`. The FFN is defined by the operation: `Output = σ(h * W_h + b_h) * W_f + b_f`, where `σ` is a non-linear activation function. What must be the dimensions of the weight matrix `W_f` for the output vector to have the same dimension as the input vector `h`?

Learn Before

Feed-Forward Network (FFN) Formula and Component Dimensions in Transformers

Multiple Choice

In a standard two-layer feed-forward network (FFN) within a Transformer, an input vector h has a dimension of d = 512. The network's hidden layer has a dimension of d_h = 2048. The FFN is defined by the operation: Output = σ(h * W_h + b_h) * W_f + b_f, where σ is a non-linear activation function. What must be the dimensions of the weight matrix W_f for the output vector to have the same dimension as the input vector h?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related