Learn Before
Concept

Two-Level Tile-Based Approach in Tensor Parallelism

In the context of modern GPUs, tensor parallelism is implemented using a two-level, tile-based approach. At a high level, a large matrix multiplication is decomposed into smaller sub-matrix multiplications that can fit into the memory of a single GPU. At a lower level, these sub-problems are executed on the GPUs using tile-based parallel algorithms that are specifically optimized for the hardware architecture.

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences