Multiple Choice

An engineer develops a sparse attention mechanism where, for any given token, the set of other tokens it can attend to is defined by a pre-determined, structured pattern based on their relative distance in the sequence. For example, a token might only attend to the 8 tokens immediately preceding it. This attention pattern does not change, regardless of the specific words or meaning of the input text. Based on how the set of attended-to indices is defined, how should this mechanism be classified?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Data Science

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science