Case Study

Architectural Trade-offs for Long-Sequence Modeling

A research team is building a model to perform question-answering over entire technical manuals, which can be hundreds of pages long. They find that a standard model architecture, where every token in the input can directly relate to every other token, is computationally infeasible due to its cost growing quadratically with the length of the manual. The team proposes a new architecture where each token only attends to a fixed-size window of its immediate neighbors (e.g., the 512 tokens before and after it). Evaluate the most significant trade-off of this proposed architectural change for their specific task.

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science