A team is designing a model to analyze genomic sequences that are millions of characters long. They observe that using a standard attention mechanism, where every character potentially attends to every other character, is computationally infeasible. If they switch to a mechanism that enforces a sparse attention weight matrix, what is the fundamental trade-off they are making?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Analyzing Computational Bottlenecks in Attention Mechanisms
A team is designing a model to analyze genomic sequences that are millions of characters long. They observe that using a standard attention mechanism, where every character potentially attends to every other character, is computationally infeasible. If they switch to a mechanism that enforces a sparse attention weight matrix, what is the fundamental trade-off they are making?
Interpreting Attention Matrix Structures