Multiple Choice

A standard transformer-based language model layer consists of a self-attention mechanism followed by a feed-forward network (FFN). An alternative architecture aims for greater parameter capacity and computational efficiency by using a routing mechanism to selectively activate one of several specialized 'expert' sub-networks within each layer for a given input. Based on this design, which component of the standard transformer layer are these 'expert' sub-networks most directly implementing and parallelizing?

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science