Learn Before
Relation
Transformer models using Sparse Models and Conditional Computation
Sparse models sparsely activate a subset of the parameters which generally improves the parameter to FLOPs ratio
Switch Transformers (Fedus et al., 2021), ST-MoE (Zoph et al., 2022), GShard (Lepikhin et al., 2020), Product-Key Memory Layers (Lample et al., 2019)
0
1
Updated 2022-10-30
Tags
Data Science
Related
Transformer models using Fixed Patterns
Transformer models using Combination of Patterns (CP)
Transformer patterns using Learnable patterns
Transformer models using Neural Memory
Transformer models using Low-Rank Methods
Transformer models using Kernels
Transformer models using Recurrence
Transformer models using Downsampling
Transformer models using Sparse Models and Conditional Computation