Multiple Choice

A machine learning engineer is preparing data for a language model. To process multiple text sequences at once, they must be grouped into a 'batch'. All sequences within a batch are made equal in length to the longest sequence by adding non-informative 'padding' tokens. To maximize computational throughput, the engineer wants to minimize the processing of these padding tokens. Which of the following batches is configured for the most efficient processing?

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science