1Cademy - Example of Padded Sequences in Fragmented Memory

Learn Before

Padding in Sequence Batching
Memory Fragmentation in LLM Inference

Example

Example of Padded Sequences in Fragmented Memory

This diagram illustrates a common scenario in LLM serving where a batch contains sequences of varying lengths, such as ⟨SOS⟩ I think this movie is better and I really like. To create a uniform batch for processing, the shorter sequence is prepended with a start-of-sequence token and padded with ⟨pad⟩ tokens, resulting in ⟨pad⟩ ⟨pad⟩ ⟨pad⟩ ⟨SOS⟩ I really like. Crucially, the image also depicts how these sequences' data blocks are stored in non-contiguous physical memory, visualizing the problem of memory fragmentation that arises from dynamic allocation.