Example

Example of Padded Sequences in Fragmented Memory

This diagram illustrates a common scenario in LLM serving where a batch contains sequences of varying lengths, such as ⟨SOS⟩ I think this movie is better and I really like. To create a uniform batch for processing, the shorter sequence is prepended with a start-of-sequence token and padded with ⟨pad⟩ tokens, resulting in ⟨pad⟩ ⟨pad⟩ ⟨pad⟩ ⟨SOS⟩ I really like. Crucially, the image also depicts how these sequences' data blocks are stored in non-contiguous physical memory, visualizing the problem of memory fragmentation that arises from dynamic allocation.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related