Case Study

Optimizing LLM Serving Configuration

Analyze the two deployment scenarios described below. For each scenario, recommend whether to use a larger or smaller request batch size to optimize performance. Justify your recommendations by explaining the resulting trade-offs between overall processing efficiency and the time it takes to get a response for a single request.

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science