Short Answer

Troubleshooting FFN Dimension Mismatch

A Transformer's Feed-Forward Network (FFN) takes an input vector of dimension d = 768 and processes it through a hidden layer of dimension d_h = 3072 before producing an output vector of the same dimension as the input (d = 768). The intermediate vector, after the first linear transformation and activation function, correctly has a dimension of 3072. If a dimension mismatch error occurs during the second linear transformation, what are the required dimensions for the second weight matrix (W_f) to resolve this error? Explain your reasoning based on the rules of matrix multiplication.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related