Relation

Transformer models using Compressed Patterns

Usage of some pooling operator to down-sample the sequence length to be a form of fixed pattern.

Compressed Attention (Liu et al., 2018) uses strided convolution to effectively reduce the sequence length.

0

1

Updated 2022-10-30

Tags

Data Science