Critique of a Parallelization Strategy
Critique the following proposal. Based on the principles of a two-level, tile-based approach for executing large matrix multiplications on modern hardware, what is the primary flaw in this strategy, and what is the likely negative consequence on performance?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is parallelizing a large matrix multiplication across a cluster of GPUs. They successfully decompose the matrix so that sub-problems fit onto each GPU, avoiding out-of-memory errors. However, profiling reveals that within each GPU, the computational cores are frequently idle, leading to poor overall performance. This suggests a bottleneck where the cores are waiting for data to be fetched from memory. Which component of a two-level, tile-based parallelization strategy is most likely misconfigured or inefficiently implemented?
High-Level Decomposition in Tensor Parallelism
Low-Level Tile-Based Execution in Tensor Parallelism
A team is implementing a large matrix multiplication using a two-level, tile-based approach for parallel processing on multiple hardware units. Match each of the following implementation goals to the level at which it is primarily addressed.
Critique of a Parallelization Strategy