Concept

Transformer Models using Blockwise Patterns

Input sequences are converted into fixed groups of words which form local receptive fields. Chunking input sequences into blocks reduces the attention complexity from N2N^2 to B2B^2 (where BB is the block size and BNB \ll N), significantly reducing the computational cost. Examples of models using this technique are Blockwise (Qiu et al., 2019) and Local Attention (Parmar et al., 2018).

0

1

Updated 2026-06-15

Tags

Data Science