Learn Before
Relation

Sources of prior attention

  • Locality
  • Prior from lower modules
  • multi-task adapters
  • attention with only prior -> attention distribution that is independent of pair-wise interaction between inputs. In other words, their models exploit only a prior attention distribution

0

1

Updated 2022-05-20

Contributors are:

Who are from:

Tags

Data Science