Framework Design for Parallel Computation
A research lab has two options for implementing the low-level execution of sub-matrix multiplications in their new distributed computing framework. Option A uses a single, general-purpose parallel algorithm that is compatible with any GPU. Option B involves developing and maintaining separate, highly-tuned algorithms specifically optimized for the unique hardware architecture of each GPU model they use (e.g., one for 'GPU-V' and another for 'GPU-A'). Option B will require significantly more initial development and ongoing maintenance effort. Based on the goal of maximizing computational efficiency, evaluate the two options and justify which one is the superior choice.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning team is training a large model using a distributed framework. They upgrade their hardware from 'GPU Architecture X' to 'GPU Architecture Y', which has significantly more raw computational power. To their surprise, the execution speed of the individual, pre-decomposed sub-matrix multiplication tasks running on each GPU decreases. Assuming no issues with networking or cooling, what is the most likely cause of this performance degradation?
Framework Design for Parallel Computation
Algorithm and Hardware Co-optimization