Learn Before
A team is designing a large language model intended for deployment on edge devices with limited memory and processing power. They are considering two different architectural modifications to reduce computational demands during inference:
- Modification X: A design where, for any given input, only a specific subset of the model's total parameters are activated and used for computation. The full set of parameters must still be available in memory.
- Modification Y: A design where, after initial training, a significant percentage of the model's parameters are permanently removed, resulting in a smaller, less dense model.
Which statement best analyzes the primary trade-off between these two modifications for this specific deployment scenario?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating an Architectural Modification for LLM Inference
A team is designing a large language model intended for deployment on edge devices with limited memory and processing power. They are considering two different architectural modifications to reduce computational demands during inference:
- Modification X: A design where, for any given input, only a specific subset of the model's total parameters are activated and used for computation. The full set of parameters must still be available in memory.
- Modification Y: A design where, after initial training, a significant percentage of the model's parameters are permanently removed, resulting in a smaller, less dense model.
Which statement best analyzes the primary trade-off between these two modifications for this specific deployment scenario?
Choosing an Efficient LLM Architecture for a Chatbot