Learn Before
A team is tasked with using a transformer-based model to summarize an entire book. The standard model architecture cannot process the entire book's text at once due to its length. The team implements a strategy where the book is broken into smaller, manageable chunks, each chunk is processed by the model, and the outputs are then combined. What is the fundamental computational bottleneck in the standard architecture that this segmentation strategy is designed to circumvent?
0
1
Tags
Data Science
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Sequence Parallelism
A team is tasked with using a transformer-based model to summarize an entire book. The standard model architecture cannot process the entire book's text at once due to its length. The team implements a strategy where the book is broken into smaller, manageable chunks, each chunk is processed by the model, and the outputs are then combined. What is the fundamental computational bottleneck in the standard architecture that this segmentation strategy is designed to circumvent?
Analyzing a Hierarchical Transformer for Genomic Data
Applying a Segmentation Strategy for Long-Form Audio