Concept

Memory-Compute-Accuracy Triangle in LLM Optimization

The optimization of LLM inference involves a three-way trade-off between memory, compute, and accuracy. This principle, known as the memory-compute-accuracy triangle, posits that improving one dimension often requires a compromise in another. For instance, using lower-precision data formats like FP16 or INT8 reduces memory usage and bandwidth requirements. However, this gain may come at the cost of decreased accuracy or numerical instability, which could necessitate additional computational work for recalibration or retraining to resolve.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences