Learn Before
Short Answer

Analyzing Attention Mechanisms for Long Sequences

A language model is designed for efficiency on very long documents. Its attention mechanism restricts each token to only interact with a small, nearby set of other tokens. While this reduces computation, the model often fails to connect information across distant parts of the document. Explain precisely how designating the first few tokens of the sequence as 'global'—making them accessible to all other tokens—addresses this limitation while largely preserving the model's computational efficiency.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science