1Cademy - A development team is building a language model based on the standard Transformer architecture to summarize lengthy legal documents, often exceeding 10,000 tokens. They observe that the models memory usage grows quadratically with the input length, leading to out-of-memory errors. Which of the following architectural modifications most directly targets the root cause of this specific memory issue?

Learn Before

Architectural Modification for Long Sequence Processing

Multiple Choice

A development team is building a language model based on the standard Transformer architecture to summarize lengthy legal documents, often exceeding 10,000 tokens. They observe that the model's memory usage grows quadratically with the input length, leading to out-of-memory errors. Which of the following architectural modifications most directly targets the root cause of this specific memory issue?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related