Learn Before
Relation
Transformer models using Blockwise Patterns
Input sequences are converted into fixed groups of words which form the local receptive fields.. Chunking input sequences into blocks reduces the complexity from to (block size) with B << N, significantly reducing the cost.
Examples of Models using this technique are
Blockwise (Qiu et al., 2019) and/or Local Attention (Parmar et al., 2018).
0
1
Updated 2022-10-30
Tags
Data Science