Learn Before
Concept
Strengthening Cross-Block Connectivity in transformers
The main idea of this concept is to create a forward path between adjacent Transformer blocks. A few methods of implementing this are:
- reuse attention distributions from previous block to guide attention of current block
- use a weighted sum of encoder representations at all encoder layers
- add a feedback mechanism to Transformer decoder, where each position attends to a weighted sum of history representations from all layer
0
1
Updated 2022-05-26
Tags
Data Science