Learn Before
Example

Example of Tensor Parallelism in an FFN Sub-layer

An example of tensor parallelism can be observed in the Feed-Forward Network (FFN) sub-layer of a neural network model. Specifically, consider multiplying an input representation tensor, denoted as hRd\mathbf{h} \in \mathbb{R}^{d}, by a large parameter matrix, denoted as WhRd×dh\mathbf{W}_h \in \mathbb{R}^{d \times d_h}. To distribute this computation across multiple devices, the parameter matrix Wh\mathbf{W}_h can be sliced vertically into a sequence of MM smaller sub-matrices, represented mathematically as: Wh=[Wh1Wh2WhM]{}\mathbf{W}_h = \begin{bmatrix} \mathbf{W}_h^{1} & \mathbf{W}_h^{2} & \dots & \mathbf{W}_h^{M} \end{bmatrix}, where each sub-matrix Whk\mathbf{W}_h^{k} has a shape of d×dhMd \times \frac{d_h}{M}. The input tensor h\mathbf{h} is then multiplied with each of these MM sub-matrices independently in parallel, and the resulting outputs are concatenated to form the final outcome.

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences