A machine learning team is training a large model using a distributed framework. They upgrade their hardware from 'GPU Architecture X' to 'GPU Architecture Y', which has significantly more raw computational power. To their surprise, the execution speed of the individual, pre-decomposed sub-matrix multiplication tasks running on each GPU decreases. Assuming no issues with networking or cooling, what is the most likely cause of this performance degradation?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning team is training a large model using a distributed framework. They upgrade their hardware from 'GPU Architecture X' to 'GPU Architecture Y', which has significantly more raw computational power. To their surprise, the execution speed of the individual, pre-decomposed sub-matrix multiplication tasks running on each GPU decreases. Assuming no issues with networking or cooling, what is the most likely cause of this performance degradation?
Framework Design for Parallel Computation
Algorithm and Hardware Co-optimization