1Cademy - A language model is designed with an efficient attention mechanism where each token can only interact with the 16 tokens immediately preceding and following it. This model performs poorly on tasks that require summarizing a long document, as it fails to connect information from the introduction to the conclusion. Which of the following architectural changes is most specifically designed to solve this type of long-range dependency issue while largely preserving the models computational efficiency?

Learn Before

Global Tokens in Attention

Multiple Choice

A language model is designed with an efficient attention mechanism where each token can only interact with the 16 tokens immediately preceding and following it. This model performs poorly on tasks that require summarizing a long document, as it fails to connect information from the introduction to the conclusion. Which of the following architectural changes is most specifically designed to solve this type of long-range dependency issue while largely preserving the model's computational efficiency?

Updated 2025-09-29

Contributors are:

Who are from:

Learn Before

Related