Learn Before
Relation
Sources of prior attention
- Locality
- Prior from lower modules
- multi-task adapters
- attention with only prior -> attention distribution that is independent of pair-wise interaction between inputs. In other words, their models exploit only a prior attention distribution
0
1
Updated 2022-05-20
Tags
Data Science