1Cademy - A developer is designing a language model for summarizing very long legal documents, where details mentioned at the beginning can be crucial for the overall summary. To manage memory usage on a constrained hardware setup, the developer implements a self-attention mechanism that, for each new token, only considers the preceding 1024 tokens. What is the most significant trade-off for this specific application?

Learn Before

Chunked and Windowed Attention

Multiple Choice

A developer is designing a language model for summarizing very long legal documents, where details mentioned at the beginning can be crucial for the overall summary. To manage memory usage on a constrained hardware setup, the developer implements a self-attention mechanism that, for each new token, only considers the preceding 1024 tokens. What is the most significant trade-off for this specific application?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related