Learn Before
Optimizing Chatbot Performance
A company is developing a real-time customer service chatbot. During testing, they observe that while the model itself generates answers quickly, users experience a noticeable delay from the moment they send their query to the moment the complete answer is displayed. To systematically address this issue, what single efficiency metric should the development team measure, and what are the three distinct phases or time segments that constitute this overall metric?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An online translation service uses the same large language model deployed in two different configurations, System A and System B. System A, located geographically close to its users, consistently delivers full translations in 2 seconds. System B, located on a different continent from its users, takes 5 seconds to deliver the same translations. The computational hardware and model software are identical for both systems. Based on this information, which component most likely accounts for the majority of the 3-second difference in the total time it takes to receive a response?
LLM Provider Selection for a Real-Time Chatbot
Optimizing Chatbot Performance