Key Matrix from a Sliding Window
A key matrix from a sliding window is a sub-matrix formed by selecting a contiguous sequence of key vectors. When denoted using slice notation as , it is constructed by vertically stacking the key vectors from index up to . This structure is represented by the formula: This matrix is a fundamental component in attention mechanisms that operate on a fixed-size context window.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Value Matrix for Causal Attention (V_≤i)
Key Matrix from a Sliding Window
Consider the following three row vectors:
r_1 = [5, 0, 3],r_2 = [1, 2, 8], andr_3 = [4, 7, 6]. If a matrixMis constructed by vertically stacking these vectors in the orderr_1,r_2, thenr_3(withr_1as the top row), what is the resulting matrixM?A matrix
Mis formed by vertically stacking its row vectors,m_0,m_1, andm_2. Given the matrixMshown below, identify the row vectorm_1.A matrix
Ais constructed by vertically stacking four row vectors, where each row vector contains five elements. The resulting matrixAwill have 5 rows and 4 columns.Key Matrix from a Sliding Window
Value Matrix from a Sliding Window
An engineer is optimizing a language model that processes long documents using an attention mechanism that considers a fixed-size window of the most recent tokens. If the engineer decides to significantly increase the size of this window, what is the primary trade-off they will encounter?
Determining the Context Window
Diagnosing Long-Range Dependency Failures
Key Matrix from a Sliding Window
Consider a matrix that contains 10 row vectors, indexed from 1 to 10 (i.e., ). The notation is used to select a sub-matrix by vertically stacking the row vectors from index to index , inclusive. Which of the following sub-matrices correctly represents ?
The notation is used to select a slice of row vectors from a larger matrix . This slice contains a total of ____ row vectors.
Extracting a Context Window from a Token Matrix
Learn After
Formula for Fixed-Size Window Memory
Suppose you have a sequence of key vectors represented as rows in a matrix, where , , , , and . Given a processing step at index and a context window size of , which matrix is constructed by selecting the contiguous block of key vectors ending at the current step?
Properties of a Sliding Window Key Matrix
True or False: For a sequence of key vectors being processed at index with a context window size of , the resulting sub-matrix of key vectors, denoted , will contain the key vectors from index 5 to index 10 (i.e., ).