1Cademy - An engineer designs a sparse attention mechanism where, for any given token at position `i`, the model is only allowed to attend to the tokens within a fixed-size window around it (e.g., from position `i-k` to `i+k`). This rule is applied uniformly across the entire sequence, irrespective of the specific words involved. Which statement best analyzes the core principle of this design?

Learn Before

Positional-based Sparse Attention

Multiple Choice

An engineer designs a sparse attention mechanism where, for any given token at position i, the model is only allowed to attend to the tokens within a fixed-size window around it (e.g., from position i-k to i+k). This rule is applied uniformly across the entire sequence, irrespective of the specific words involved. Which statement best analyzes the core principle of this design?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related