Learn Before
Evaluating a Tensor Parallelism Decomposition Strategy
A machine learning engineering team is tasked with performing a large matrix multiplication (C = A x B) across a cluster of GPUs. They propose a high-level decomposition strategy to break the problem down. Based on the parameters provided in the case study, evaluate whether their proposed strategy is viable. Justify your conclusion with specific calculations.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is implementing a distributed computing strategy where a very large matrix multiplication is split across multiple processing units. The process repeatedly fails, reporting 'out-of-memory' errors on the individual units, even though the total problem size is well within the combined memory capacity of all units. The network connection between units is stable. Which of the following is the most probable cause of this specific error?
Sizing Sub-Problems in Distributed Computation
Evaluating a Tensor Parallelism Decomposition Strategy