Activity (Process)

High-Level Decomposition in Tensor Parallelism

The first level of the tile-based approach for tensor parallelism on GPUs involves breaking down a large matrix multiplication into smaller, more manageable sub-matrix multiplications. This decomposition is specifically designed to ensure that each sub-problem is small enough to fit within the memory constraints of a single GPU.

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences