Learn Before
Concept

Tensor Parallelism

Tensor parallelism is a model parallelism technique where operations are distributed within a single computation step. A standard approach involves splitting a large parameter matrix into smaller chunks or sub-matrices. An input tensor is then multiplied with each of these chunks separately and in parallel across different workers or devices. Finally, the results of these parallel multiplications are concatenated to produce the complete output.

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences