In a Transformer architecture modified for prefix-tuning, the hidden state representations corresponding to the trainable prefix vectors are passed along with the main input's hidden states to the subsequent layer to ensure the model has access to the learned task-specific information at every stage.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Output Selection Formula in a Prefix-Tuned Transformer Layer
Inter-Layer Data Flow in Prefix-Tuning
Consequences of Output Selection in a Modified Transformer
In a Transformer layer adapted for prefix-tuning, the input consists of a set of trainable prefix vectors followed by the hidden states from the original input sequence. After this combined input is processed by the layer, the resulting hidden states corresponding to the prefix vectors are discarded, and only the states for the original sequence are passed on. What is the most critical reason for this selective output process?
In a Transformer architecture modified for prefix-tuning, the hidden state representations corresponding to the trainable prefix vectors are passed along with the main input's hidden states to the subsequent layer to ensure the model has access to the learned task-specific information at every stage.