Learn Before
A company's AI-powered customer support system experiences significant slowdowns during peak traffic times. The system can only process a limited number of user queries simultaneously, creating a bottleneck. To address this, the primary goal is to increase the number of queries the model can handle per second. Which of the following actions would most directly improve this specific performance metric?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Model Selection for a High-Traffic Application
A company is evaluating two language model systems for a real-time translation service. System A processes 240,000 tokens in 60 seconds. System B processes 300,000 tokens in 120 seconds. Based on their processing capacity, which system is more efficient for this high-demand task and why?
A company's AI-powered customer support system experiences significant slowdowns during peak traffic times. The system can only process a limited number of user queries simultaneously, creating a bottleneck. To address this, the primary goal is to increase the number of queries the model can handle per second. Which of the following actions would most directly improve this specific performance metric?