Learn Before
Evaluating an Architectural Modification for LLM Inference
Critique the engineering team's proposal to use a 'mixture-of-experts' architecture. In your evaluation, justify whether this is an appropriate strategy to address the observed issues of high battery consumption and latency. Explain the underlying reason why this architectural change would impact computational demand during inference and discuss at least one potential trade-off the team must consider.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating an Architectural Modification for LLM Inference
A team is designing a large language model intended for deployment on edge devices with limited memory and processing power. They are considering two different architectural modifications to reduce computational demands during inference:
- Modification X: A design where, for any given input, only a specific subset of the model's total parameters are activated and used for computation. The full set of parameters must still be available in memory.
- Modification Y: A design where, after initial training, a significant percentage of the model's parameters are permanently removed, resulting in a smaller, less dense model.
Which statement best analyzes the primary trade-off between these two modifications for this specific deployment scenario?
Choosing an Efficient LLM Architecture for a Chatbot