Relation

Transformer models using Blockwise Patterns

Input sequences are converted into fixed groups of words which form the local receptive fields.. Chunking input sequences into blocks reduces the complexity from N2N^2 to B2B^2 (block size) with B << N, significantly reducing the cost.

Examples of Models using this technique are

Blockwise (Qiu et al., 2019) and/or Local Attention (Parmar et al., 2018).

0

1

Updated 2022-10-30

Tags

Data Science