Definition

Formal Definition of LLM Inference

The inference process in Large Language Models (LLMs) is formally defined as finding the most probable output sequence based on a given user context. Let x\mathbf{x} denote the input token sequence (conceptually equivalent to a 'prompt'), which comprises m+1m+1 tokens denoted by x0...xmx_0...x_m, where x0x_0 is the start symbol SOS\langle \mathrm{SOS} \rangle. Let y\mathbf{y} denote the subsequent output token sequence (the response), comprising nn tokens denoted by y1...yny_1...y_n. The output tokens preceding position ii are denoted as y<i=y1...yi1\mathbf{y}_{<i} = y_1...y_{i-1}. The primary goal of LLM inference is to maximize the conditional probability Pr(yx)\Pr(\mathbf{y}|\mathbf{x}), evaluating the context x\mathbf{x} to determine the most likely sequence y\mathbf{y}. Furthermore, the input and output can be concatenated into a single sequence [x,y]=x0...xmy1...yn[\mathbf{x},\mathbf{y}] = x_0 ... x_m y_1 ... y_n (sometimes represented as seqx,y\mathrm{seq}_{\mathbf{x},\mathbf{y}}) to compute joint log-probabilities in decoder-only models.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models

Related