Learn Before
Causal Attention Value Matrix Dimensions
An autoregressive model is processing a sequence of tokens. To calculate the output for the token at index 8, it uses a specific matrix of value vectors. If each individual value vector in the sequence has 256 dimensions, what are the dimensions of the value matrix used for this specific calculation? Explain how you arrived at this answer.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.5 Inference - Foundations of Large Language Models
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Causal Attention Input Structure
An autoregressive model processes an input sequence of 5 tokens, indexed 0 through 4. When calculating the output for the token at index 3, the attention mechanism needs to access a specific set of 'value' vectors from the sequence. Which of the following correctly describes the collection of value vectors available to the query at index 3?
Causal Attention Value Matrix Dimensions
An autoregressive model processes an input sequence one token at a time. At each position
i, it constructs a matrix containing all value vectors from the beginning of the sequence up to and including positioni. Arrange the matrices below in the order they would be constructed as the model processes the first three positions (indexed 0, 1, and 2).