Learn Before
Determining the Context Window
A language model processes a sequence of tokens one by one. To compute the representation for the current token, it uses an attention mechanism that only considers a local context. This context consists of the current token plus a fixed number of the most recent preceding tokens. The total number of tokens in this context is defined by a 'window size' parameter.
Given the sequence [T1, T2, T3, T4, T5, T6, T7, T8, T9, T10] and a window size of 4, which specific tokens form the context when the model is processing token T8?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Key Matrix from a Sliding Window
Value Matrix from a Sliding Window
An engineer is optimizing a language model that processes long documents using an attention mechanism that considers a fixed-size window of the most recent tokens. If the engineer decides to significantly increase the size of this window, what is the primary trade-off they will encounter?
Determining the Context Window
Diagnosing Long-Range Dependency Failures