1Cademy - A team is designing a large language model intended for deployment on edge devices with limited memory and processing power. They are considering two different architectural modifications to reduce computational demands during inference: * **Modification X:** A design where, for any given input, only a specific subset of the models total parameters are activated and used for computation. The full set of parameters must still be available in memory. * **Modification Y:** A design where, after initial training, a significant percentage of the models parameters are permanently removed, resulting in a smaller, less dense model. Which statement best analyzes the primary trade-off between these two modifications for this specific deployment scenario?

Learn Before

Efficient Architecture Design for LLM Inference

Multiple Choice

A team is designing a large language model intended for deployment on edge devices with limited memory and processing power. They are considering two different architectural modifications to reduce computational demands during inference:

Modification X: A design where, for any given input, only a specific subset of the model's total parameters are activated and used for computation. The full set of parameters must still be available in memory.
Modification Y: A design where, after initial training, a significant percentage of the model's parameters are permanently removed, resulting in a smaller, less dense model.

Which statement best analyzes the primary trade-off between these two modifications for this specific deployment scenario?

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related