Definition

Inter-token Latency (ITL)

Inter-token Latency (ITL) is an efficiency metric that measures the time required to generate each token following the initial one. It is a key indicator of the performance of the model's decoding process.

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related