Multiple Choice

An engineer is designing the feed-forward network (FFN) within a transformer block for a new large language model. They choose to implement a Gated Linear Unit with a GELU activation (GeGLU). Which statement best analyzes the primary advantage of using this specific activation structure compared to a simpler, non-gated activation function like ReLU?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science