Learn Before
An autoregressive model processes an input sequence one token at a time. At each position i, it constructs a matrix containing all value vectors from the beginning of the sequence up to and including position i. Arrange the matrices below in the order they would be constructed as the model processes the first three positions (indexed 0, 1, and 2).
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.5 Inference - Foundations of Large Language Models
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Causal Attention Input Structure
An autoregressive model processes an input sequence of 5 tokens, indexed 0 through 4. When calculating the output for the token at index 3, the attention mechanism needs to access a specific set of 'value' vectors from the sequence. Which of the following correctly describes the collection of value vectors available to the query at index 3?
Causal Attention Value Matrix Dimensions
An autoregressive model processes an input sequence one token at a time. At each position
i, it constructs a matrix containing all value vectors from the beginning of the sequence up to and including positioni. Arrange the matrices below in the order they would be constructed as the model processes the first three positions (indexed 0, 1, and 2).