1Cademy - A standard transformer-based language model layer consists of a self-attention mechanism followed by a feed-forward network (FFN). An alternative architecture aims for greater parameter capacity and computational efficiency by using a routing mechanism to selectively activate one of several specialized expert sub-networks within each layer for a given input. Based on this design, which component of the standard transformer layer are these expert sub-networks most directly implementing and parallelizing?

Learn Before

Experts as Modular FFNs in LLM MoE Models

Multiple Choice

A standard transformer-based language model layer consists of a self-attention mechanism followed by a feed-forward network (FFN). An alternative architecture aims for greater parameter capacity and computational efficiency by using a routing mechanism to selectively activate one of several specialized 'expert' sub-networks within each layer for a given input. Based on this design, which component of the standard transformer layer are these 'expert' sub-networks most directly implementing and parallelizing?

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related