In a specific parameter-efficient tuning method, each layer of a transformer is adapted by prepending a sequence of new, trainable vectors to the sequence of hidden states from the previous layer. Suppose for a given layer, the sequence of these new trainable vectors has a length of 20, and the sequence of hidden states corresponding to the original text input has a length of 128. After this layer processes the combined sequence, a new set of hidden states is generated. How is the complete hidden state sequence for the next layer constructed?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a specific parameter-efficient tuning method, each layer of a transformer is adapted by prepending a sequence of new, trainable vectors to the sequence of hidden states from the previous layer. Suppose for a given layer, the sequence of these new trainable vectors has a length of 20, and the sequence of hidden states corresponding to the original text input has a length of 128. After this layer processes the combined sequence, a new set of hidden states is generated. How is the complete hidden state sequence for the next layer constructed?
Analyzing an Incorrect Hidden State Composition
Constructing the Input Hidden State for a Prefix-Tuned Layer