1Cademy - Formal Definition of LLM Inference

Learn Before

Definition

Formal Definition of LLM Inference

The inference process in Large Language Models (LLMs) is formally defined as finding the most probable output sequence based on a given user context. Let $\mathbf{x}$ denote the input token sequence (conceptually equivalent to a 'prompt'), which comprises $m+1$ tokens denoted by $x_0...x_m$ , where $x_0$ is the start symbol $\langle \mathrm{SOS} \rangle$ . Let $\mathbf{y}$ denote the subsequent output token sequence (the response), comprising $n$ tokens denoted by $y_1...y_n$ . The output tokens preceding position $i$ are denoted as $\mathbf{y}_{<i} = y_1...y_{i-1}$ . The primary goal of LLM inference is to maximize the conditional probability $\Pr(\mathbf{y}|\mathbf{x})$ , evaluating the context $\mathbf{x}$ to determine the most likely sequence $\mathbf{y}$ . Furthermore, the input and output can be concatenated into a single sequence $[\mathbf{x},\mathbf{y}] = x_0 ... x_m y_1 ... y_n$ (sometimes represented as $\mathrm{seq}_{\mathbf{x},\mathbf{y}}$ ) to compute joint log-probabilities in decoder-only models.

Updated 2026-05-03

Contributors are:

Who are from:

References

Learn Before

Related

Learn After