Content-based Sparse Attention
A content-based sparse graph is constructed by selecting those keys that are likely to have large similarity scores with the given query (form of the Maximum Inner Product Search (MIPS) problem)
0
1
Tags
Data Science
Related
Content-based Sparse Attention
Positional-based Sparse Attention
Classifying a Novel Sparse Attention Mechanism
An engineer develops a sparse attention mechanism where, for any given token, the set of other tokens it can attend to is defined by a pre-determined, structured pattern based on their relative distance in the sequence. For example, a token might only attend to the 8 tokens immediately preceding it. This attention pattern does not change, regardless of the specific words or meaning of the input text. Based on how the set of attended-to indices is defined, how should this mechanism be classified?
A key characteristic of all sparse attention models is that the set of attended-to indices for a given token is dynamically determined by finding other tokens with the most similar content.