Learn Before
A startup with a limited computational budget is tasked with building a system to analyze and summarize entire books for a digital library. A key requirement is that the model must process the full context of these very long documents simultaneously. Why would a standard transformer architecture be a poor choice for this specific task, and what is the implication for model selection?
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Taxonomy of Efficient Transformers
High-Performance Computing Improvements for Transformers
Language Model Scaling Problem
Developing Efficient Architectures and Training for Long-Sequence Self-Attention
A startup with a limited computational budget is tasked with building a system to analyze and summarize entire books for a digital library. A key requirement is that the model must process the full context of these very long documents simultaneously. Why would a standard transformer architecture be a poor choice for this specific task, and what is the implication for model selection?
Scaling Limitations of Standard Transformers