1Cademy - Example of Tensor Parallelism in an FFN Sub-layer

Learn Before

Layerwise Partitioning

Example

Example of Tensor Parallelism in an FFN Sub-layer

An example of tensor parallelism can be observed in the Feed-Forward Network (FFN) sub-layer of a neural network model. Specifically, consider multiplying an input representation tensor, denoted as $\mathbf{h} \in \mathbb{R}^{d}$ , by a large parameter matrix, denoted as $\mathbf{W}_h \in \mathbb{R}^{d \times d_h}$ . To distribute this computation across multiple devices, the parameter matrix $\mathbf{W}_h$ can be sliced vertically into a sequence of $M$ smaller sub-matrices, represented mathematically as: ${}\mathbf{W}_h = \begin{bmatrix} \mathbf{W}_h^{1} & \mathbf{W}_h^{2} & \dots & \mathbf{W}_h^{M} \end{bmatrix}$ , where each sub-matrix $\mathbf{W}_h^{k}$ has a shape of $d \times \frac{d_h}{M}$ . The input tensor $\mathbf{h}$ is then multiplied with each of these $M$ sub-matrices independently in parallel, and the resulting outputs are concatenated to form the final outcome.

0

1

Updated 2026-04-21

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related