Learn Before
Choosing an Efficient LLM Architecture for a Chatbot
A company needs to improve the inference speed of its large language model, which is currently too slow for a real-time, interactive chatbot application. Two architectural modification proposals are on the table. Evaluate the proposals and recommend one, justifying your decision based on the competing goals of reducing latency, minimizing redevelopment effort, and maintaining response quality.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating an Architectural Modification for LLM Inference
A team is designing a large language model intended for deployment on edge devices with limited memory and processing power. They are considering two different architectural modifications to reduce computational demands during inference:
- Modification X: A design where, for any given input, only a specific subset of the model's total parameters are activated and used for computation. The full set of parameters must still be available in memory.
- Modification Y: A design where, after initial training, a significant percentage of the model's parameters are permanently removed, resulting in a smaller, less dense model.
Which statement best analyzes the primary trade-off between these two modifications for this specific deployment scenario?
Choosing an Efficient LLM Architecture for a Chatbot