Analyzing Computational Bottlenecks in Attention Mechanisms
Based on the scenario described, identify the likely structure of the model's attention weight matrix and explain why it is causing the observed performance issues. Then, propose an alternative structure that would be more suitable for this task and justify your choice by contrasting it with the original.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Analyzing Computational Bottlenecks in Attention Mechanisms
A team is designing a model to analyze genomic sequences that are millions of characters long. They observe that using a standard attention mechanism, where every character potentially attends to every other character, is computationally infeasible. If they switch to a mechanism that enforces a sparse attention weight matrix, what is the fundamental trade-off they are making?
Interpreting Attention Matrix Structures