Learn Before
Taxonomy of Efficient Transformers
Primary goal of an efficient transformer model is to improve the memory complexity of the self attention mechanism.The different methods or patterns that significantly improves the efficiency can be classified as shown below
- Fixed Patterns (FP)
- Blockwise Patterns
- Strided Patterns
- Compressed Patterns
- Blockwise Patterns
- Combination of Patterns (CP)
- Learnable Patterns (LP)
- Neural Memory
- Low-Rank Methods
- Kernel
- Recurrence
- Downsampling
- Sparse Models and Conditional Computation
0
1
Tags
Data Science
Related
Taxonomy of Efficient Transformers
High-Performance Computing Improvements for Transformers
Language Model Scaling Problem
Developing Efficient Architectures and Training for Long-Sequence Self-Attention
A startup with a limited computational budget is tasked with building a system to analyze and summarize entire books for a digital library. A key requirement is that the model must process the full context of these very long documents simultaneously. Why would a standard transformer architecture be a poor choice for this specific task, and what is the implication for model selection?
Scaling Limitations of Standard Transformers
Learn After
Transformer models using Fixed Patterns
Transformer models using Combination of Patterns (CP)
Transformer patterns using Learnable patterns
Transformer models using Neural Memory
Transformer models using Low-Rank Methods
Transformer models using Kernels
Transformer models using Recurrence
Transformer models using Downsampling
Transformer models using Sparse Models and Conditional Computation