Case Study

Evaluating FFN Design Trade-offs in a Resource-Constrained LLM Project

A research team is developing a new language model on a fixed computational budget, and the Feed-Forward Network (FFN) sub-layer is the primary performance bottleneck. The team is debating two design options for the FFN:

  • Option A: Use a simple, computationally inexpensive non-linear activation function. This allows them to maximize the hidden layer's dimension, making it extremely large.
  • Option B: Use a more complex, computationally expensive activation function that is theorized to be more expressive per-neuron. To stay within the same budget, this would require them to significantly reduce the hidden layer's dimension.

Based on the principles of designing wide FFNs for modern large-scale models, which option should the team choose? Justify your decision by evaluating the trade-offs between the activation function's complexity and the hidden layer's width in terms of model capacity and computational efficiency.

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science