Architectural Trade-offs for Long-Sequence Modeling
A research team is building a model to perform question-answering over entire technical manuals, which can be hundreds of pages long. They find that a standard model architecture, where every token in the input can directly relate to every other token, is computationally infeasible due to its cost growing quadratically with the length of the manual. The team proposes a new architecture where each token only attends to a fixed-size window of its immediate neighbors (e.g., the 512 tokens before and after it). Evaluate the most significant trade-off of this proposed architectural change for their specific task.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Architectural Trade-offs for Long-Sequence Modeling
Evaluating Efficient Architectures for Long-Document Analysis
A research team is designing a new language model specifically for summarizing entire books, which involves processing extremely long sequences of text. Their primary constraint is a limited computational budget, which restricts both the training time and the memory available on their hardware. Which of the following architectural goals is most critical for the team to pursue to make their project feasible?