Learn Before
Predicting Resource Bottlenecks
A developer attempts to run inference for a very large language model on a computer with a GPU that has limited memory. The model's size exceeds the available GPU memory. Describe the likely impact on both GPU memory consumption and CPU processing power usage during this process, and explain the reasoning behind this impact.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A software team deploys a large language model on a server to power a real-time translation service. During periods of high user traffic, they observe a significant increase in the time it takes for the model to generate a translation. They collect the following average resource usage metrics from the server during these high-traffic periods:
- GPU Processing Power Usage: 98%
- GPU Memory Consumption: 95%
- CPU Processing Power Usage: 15%
- System Memory (RAM) Consumption: 25%
Based on this data, what is the most likely cause of the performance slowdown?
LLM Deployment Strategy for a Startup
Predicting Resource Bottlenecks