A large language model is generating the next token in a sequence. Arrange the following steps in the correct chronological order as they relate to updating the matrix of 'key' vectors for the attention mechanism.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a transformer model generating text, a matrix of 'key' vectors is maintained for all previously generated tokens. Suppose at a certain step, this matrix
Kcontains vectors for two previous tokens and is represented as:K = [[0.1, 0.5], [0.9, 0.2]]The model then processes a new token and generates a corresponding new key vector
k_new:k_new = [0.4, 0.8]Based on the standard procedure for expanding this matrix during text generation, what will the updated matrix
Kbe after incorporatingk_new?A large language model is generating the next token in a sequence. Arrange the following steps in the correct chronological order as they relate to updating the matrix of 'key' vectors for the attention mechanism.
In an autoregressive language model generating a sequence of text, the matrix containing 'key' vectors for previously generated tokens is updated at each step. Consider a scenario where this matrix has been populated with vectors from the first 10 tokens. When the 11th token is processed and its corresponding key vector is generated, the update procedure involves replacing the key vector of the very first token with the new one to keep the matrix size constant.