Learn Before
Padding in Sequence Batching
To handle sequences of varying lengths within a single batch, a technique called padding is often used. This involves adding special ⟨pad⟩ tokens to the shorter sequences until they match the length of the longest sequence in the batch. This creates a uniform tensor shape, which is necessary for efficient parallel computation in many deep learning frameworks. The image shows this by adding three ⟨pad⟩ tokens to the shorter of the two sequences.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Padding in Sequence Batching
Analyzing Batch Processing Challenges
A language model inference engine receives a batch of two user requests to process simultaneously for improved efficiency. The first request is 'Summarize the main causes of the Industrial Revolution in five points,' and the second is 'Define photosynthesis.' What is the primary computational challenge that arises from combining these specific requests into a single batch?
The Challenge of Variable-Length Sequences in Batch Processing
Learn After
Example of Padded Sequences in Fragmented Memory
A deep learning model is being prepared to process the following three text sequences together in a single batch:
['The', 'cat', 'sat'],['A', 'quick', 'brown', 'fox'], and['On', 'the', 'mat']. To ensure all sequences have a uniform length for efficient computation, a special⟨pad⟩token is added to the end of the shorter sequences. Which of the following options correctly represents the batch after this process is applied?Debugging a Batch Processing Error
Consequences of Non-Uniform Sequence Lengths
Efficiency of Batching Sequences with Similar Lengths
Left Padding in LLM Batching