Learn Before
A software team deploys a large language model on a server to power a real-time translation service. During periods of high user traffic, they observe a significant increase in the time it takes for the model to generate a translation. They collect the following average resource usage metrics from the server during these high-traffic periods:
- GPU Processing Power Usage: 98%
- GPU Memory Consumption: 95%
- CPU Processing Power Usage: 15%
- System Memory (RAM) Consumption: 25%
Based on this data, what is the most likely cause of the performance slowdown?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A software team deploys a large language model on a server to power a real-time translation service. During periods of high user traffic, they observe a significant increase in the time it takes for the model to generate a translation. They collect the following average resource usage metrics from the server during these high-traffic periods:
- GPU Processing Power Usage: 98%
- GPU Memory Consumption: 95%
- CPU Processing Power Usage: 15%
- System Memory (RAM) Consumption: 25%
Based on this data, what is the most likely cause of the performance slowdown?
LLM Deployment Strategy for a Startup
Predicting Resource Bottlenecks