1Cademy - An engineer is optimizing a model for processing extremely long text sequences. To reduce the computational load, the model is designed so that each token primarily attends to a limited, local neighborhood of other tokens. The engineer observes that the model struggles to connect information from the end of a document back to key concepts introduced in the very first paragraph. Which of the following modifications best addresses this issue by providing a form of global context without sacrificing the overall computational efficiency?

Learn Before

Global Tokens for Attention

Multiple Choice

An engineer is optimizing a model for processing extremely long text sequences. To reduce the computational load, the model is designed so that each token primarily attends to a limited, local neighborhood of other tokens. The engineer observes that the model struggles to connect information from the end of a document back to key concepts introduced in the very first paragraph. Which of the following modifications best addresses this issue by providing a form of global context without sacrificing the overall computational efficiency?

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related