Concept

Inference Engine in LLM Systems

The Inference Engine is the component of an LLM system responsible for the direct execution of the model. It processes incoming requests that have been queued, carrying out the inference computation which involves both prefilling and decoding stages.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences