Learn Before
Multiple Choice

An engineer is optimizing a model for processing extremely long text sequences. To reduce the computational load, the model is designed so that each token primarily attends to a limited, local neighborhood of other tokens. The engineer observes that the model struggles to connect information from the end of a document back to key concepts introduced in the very first paragraph. Which of the following modifications best addresses this issue by providing a form of global context without sacrificing the overall computational efficiency?

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science