Constructing the Input Hidden State for a Prefix-Tuned Layer
In a transformer model adapted with prefix vectors, consider the input to layer l+1. The prefix for this layer consists of 10 trainable vectors. The hidden states corresponding to the original text input, which were processed by the previous layer, form a sequence of 512 vectors. Each vector in the model has a dimension of 768. Describe the structure of the complete hidden state sequence that is fed into the self-attention mechanism of layer l+1, and state its final dimensions.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a specific parameter-efficient tuning method, each layer of a transformer is adapted by prepending a sequence of new, trainable vectors to the sequence of hidden states from the previous layer. Suppose for a given layer, the sequence of these new trainable vectors has a length of 20, and the sequence of hidden states corresponding to the original text input has a length of 128. After this layer processes the combined sequence, a new set of hidden states is generated. How is the complete hidden state sequence for the next layer constructed?
Analyzing an Incorrect Hidden State Composition
Constructing the Input Hidden State for a Prefix-Tuned Layer