Concept

Positional-based Sparse Attention

In positional-based sparse attention, the index set GG is defined using pre-determined, heuristically designed patterns based on the relative positions of tokens, rather than their content. This means the sparsity pattern is fixed and does not depend on the input values. A common and widely-used example of such a heuristic pattern is the sliding window, where the set GG for a token at position ii covers a fixed-size window of nearby tokens.

0

1

Updated 2026-04-22

Tags

Data Science

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences