A machine learning engineer is preparing data for a language model. To process multiple text sequences at once, they must be grouped into a 'batch'. All sequences within a batch are made equal in length to the longest sequence by adding non-informative 'padding' tokens. To maximize computational throughput, the engineer wants to minimize the processing of these padding tokens. Which of the following batches is configured for the most efficient processing?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning engineer is preparing data for a language model. To process multiple text sequences at once, they must be grouped into a 'batch'. All sequences within a batch are made equal in length to the longest sequence by adding non-informative 'padding' tokens. To maximize computational throughput, the engineer wants to minimize the processing of these padding tokens. Which of the following batches is configured for the most efficient processing?
A batch of four text sequences is being prepared for processing by a language model. The lengths of the sequences are 25, 28, 30, and 60 tokens. To process them together, all sequences must be extended to the length of the longest one by adding non-informative 'padding' tokens. What percentage of the total tokens in the final prepared batch consists of these non-informative padding tokens?
Optimizing Batch Processing Strategy