1Cademy - Request Latency

Learn Before

Efficiency Metrics for LLM Evaluation

Definition

Request Latency

Request Latency is an efficiency metric that measures the total end-to-end time from when a user sends a request to an LLM until the full response is received. This duration includes the time for data transmission, the model's processing time, and the time for the output to be returned to the user.

Updated 2026-05-05

Contributors are:

Who are from:

References