Learn Before
Concept

Quantization for LLM Inference

Quantization is a model compression technique that optimizes LLM inference by reducing the numerical precision of the model's parameters. This process decreases memory usage and accelerates computation speed, but it often involves a trade-off, as it can introduce minor degradations in model performance.

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences