Multiple Choice

An engineering team is optimizing a large-scale text generation service that processes long user prompts by breaking them into sequential segments. The team observes that while the service can handle a high volume of concurrent requests (high throughput), individual users complain about a noticeable delay before the first word of a response appears (high latency). The processing time for each segment is currently much longer than the time required to generate a single output word. Which of the following actions is the most effective first step to address the high latency issue?

0

1

Updated 2025-09-29

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science