Learn Before
Relation

Transformer models using Sparse Models and Conditional Computation

Sparse models sparsely activate a subset of the parameters which generally improves the parameter to FLOPs ratio

Switch Transformers (Fedus et al., 2021), ST-MoE (Zoph et al., 2022), GShard (Lepikhin et al., 2020), Product-Key Memory Layers (Lample et al., 2019)

0

1

Updated 2022-10-30

Tags

Data Science