Multiple Choice

A machine learning team is training a model whose layers are partitioned and distributed across 8 specialized processing units because the full model is too large for a single unit. During training, they observe that at any given moment in the forward or backward pass, only one unit is actively computing its assigned layers while the other 7 are idle, waiting for their turn. This sequential processing leads to poor overall hardware utilization. Which of the following strategies would most effectively address this specific inefficiency?

0

1

Updated 2025-10-01

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science