High-Level Decomposition in Tensor Parallelism
The first level of the tile-based approach for tensor parallelism on GPUs involves breaking down a large matrix multiplication into smaller, more manageable sub-matrix multiplications. This decomposition is specifically designed to ensure that each sub-problem is small enough to fit within the memory constraints of a single GPU.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A team is parallelizing a large matrix multiplication across a cluster of GPUs. They successfully decompose the matrix so that sub-problems fit onto each GPU, avoiding out-of-memory errors. However, profiling reveals that within each GPU, the computational cores are frequently idle, leading to poor overall performance. This suggests a bottleneck where the cores are waiting for data to be fetched from memory. Which component of a two-level, tile-based parallelization strategy is most likely misconfigured or inefficiently implemented?
High-Level Decomposition in Tensor Parallelism
Low-Level Tile-Based Execution in Tensor Parallelism
A team is implementing a large matrix multiplication using a two-level, tile-based approach for parallel processing on multiple hardware units. Match each of the following implementation goals to the level at which it is primarily addressed.
Critique of a Parallelization Strategy
Learn After
A team is implementing a distributed computing strategy where a very large matrix multiplication is split across multiple processing units. The process repeatedly fails, reporting 'out-of-memory' errors on the individual units, even though the total problem size is well within the combined memory capacity of all units. The network connection between units is stable. Which of the following is the most probable cause of this specific error?
Sizing Sub-Problems in Distributed Computation
Evaluating a Tensor Parallelism Decomposition Strategy