Learn Before
LLM Deployment Strategy for a Startup
A startup is developing a customer service chatbot and needs to choose between two large language models. They have a limited budget and can only afford to run the service on a single, mid-range server with a GPU that has 12 GB of memory. Your task is to analyze the provided data and recommend which model the startup should deploy. Justify your recommendation by evaluating the trade-offs between model performance and computational resource demands.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A software team deploys a large language model on a server to power a real-time translation service. During periods of high user traffic, they observe a significant increase in the time it takes for the model to generate a translation. They collect the following average resource usage metrics from the server during these high-traffic periods:
- GPU Processing Power Usage: 98%
- GPU Memory Consumption: 95%
- CPU Processing Power Usage: 15%
- System Memory (RAM) Consumption: 25%
Based on this data, what is the most likely cause of the performance slowdown?
LLM Deployment Strategy for a Startup
Predicting Resource Bottlenecks