Learn Before
Concept
Sparsity for MoE
Sparsity means that the activated experts should be sparse among all sub-networks for computational efficiency. This can be achieved by calculating a SoftMax score for each expert, and only activate the top few.
0
1
Updated 2022-06-25
Tags
Data Science