Learn Before
A language model is designed with an efficient attention mechanism where each token can only interact with the 16 tokens immediately preceding and following it. This model performs poorly on tasks that require summarizing a long document, as it fails to connect information from the introduction to the conclusion. Which of the following architectural changes is most specifically designed to solve this type of long-range dependency issue while largely preserving the model's computational efficiency?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Information Loss in Fixed-Size Global Memory
Advantage of Global Tokens in Stabilizing Attention
A language model is designed with an efficient attention mechanism where each token can only interact with the 16 tokens immediately preceding and following it. This model performs poorly on tasks that require summarizing a long document, as it fails to connect information from the introduction to the conclusion. Which of the following architectural changes is most specifically designed to solve this type of long-range dependency issue while largely preserving the model's computational efficiency?
Evaluating an Attention Mechanism for Legal Document Processing
In an attention mechanism that uses a fixed number of designated tokens as a form of global memory, continuously increasing the number of these special tokens is a guaranteed strategy to improve model performance on all tasks without introducing any negative consequences.