Concept

Strengthening Cross-Block Connectivity in transformers

The main idea of this concept is to create a forward path between adjacent Transformer blocks. A few methods of implementing this are:

  • reuse attention distributions from previous block to guide attention of current block
  • use a weighted sum of encoder representations at all encoder layers
  • add a feedback mechanism to Transformer decoder, where each position attends to a weighted sum of history representations from all layer

0

1

Updated 2022-05-26

Contributors are:

Who are from:

Tags

Data Science