Learn Before
Concept
Transformer Models using Blockwise Patterns
Input sequences are converted into fixed groups of words which form local receptive fields. Chunking input sequences into blocks reduces the attention complexity from to (where is the block size and ), significantly reducing the computational cost. Examples of models using this technique are Blockwise (Qiu et al., 2019) and Local Attention (Parmar et al., 2018).
0
1
Updated 2026-06-15
Tags
Data Science