Learn Before
Multiple Choice

The Gated Linear Unit (GLU) architecture processes an input through two parallel linear transformations. One of these transformed outputs is then passed through a non-linear function before being combined with the other via an element-wise product. What is the analytical purpose of this non-linearly transformed path in the overall mechanism?

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science