Concept

Padding in Sequence Batching

To handle sequences of varying lengths within a single batch, a technique called padding is often used. This involves adding special ⟨pad⟩ tokens to the shorter sequences until they match the length of the longest sequence in the batch. This creates a uniform tensor shape, which is necessary for efficient parallel computation in many deep learning frameworks. The image shows this by adding three ⟨pad⟩ tokens to the shorter of the two sequences.

Image 0

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences