Learn Before
Relation
Transformer models using Compressed Patterns
Usage of some pooling operator to down-sample the sequence length to be a form of fixed pattern.
Compressed Attention (Liu et al., 2018) uses strided convolution to effectively reduce the sequence length.
0
1
Updated 2022-10-30
Tags
Data Science