Learn Before
Example of Tensor Parallelism in an FFN Sub-layer
An example of tensor parallelism can be observed in the Feed-Forward Network (FFN) sub-layer of a neural network model. Specifically, consider multiplying an input representation tensor, denoted as , by a large parameter matrix, denoted as . To distribute this computation across multiple devices, the parameter matrix can be sliced vertically into a sequence of smaller sub-matrices, represented mathematically as: , where each sub-matrix has a shape of . The input tensor is then multiplied with each of these sub-matrices independently in parallel, and the resulting outputs are concatenated to form the final outcome.
0
1
Tags
Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Two-Level Tile-Based Approach in Tensor Parallelism
A machine learning engineer is training a model with an exceptionally large layer. The weight matrix for this single layer is so large that it cannot fit into the memory of one GPU, causing an 'out-of-memory' error during the matrix multiplication step. Which of the following strategies directly addresses this specific memory bottleneck by parallelizing the problematic matrix multiplication itself across multiple devices?
Solving a Memory Bottleneck with Parallelism
Analyzing Distributed Matrix Multiplication Strategies
Example of Tensor Parallelism in an FFN Sub-layer