Definition

Request Latency

Request Latency is an efficiency metric that measures the total end-to-end time from when a user sends a request to an LLM until the full response is received. This duration includes the time for data transmission, the model's processing time, and the time for the output to be returned to the user.

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related