Composition of Hidden States in a Prefix-Tuned Layer
In a prefix-tuned model, the complete hidden state for layer , denoted as , is formed by concatenating the prefix vectors with the processed hidden states of the original input sequence. This composition is represented by the formula: where is the sequence of output hidden states corresponding to the original input, which can be further expanded as:

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A Transformer layer adapted for a specific fine-tuning method receives a combined input sequence. This input is created by prepending 20 trainable vectors to a sequence of 128 hidden states from the previous layer. After processing this combined sequence of 148 vectors, the layer produces a full set of 148 output hidden states. Which portion of this full output is selected to be passed on to the next layer in the network?
Calculating the Output Slice in Prefix-Tuning
Composition of Hidden States in a Prefix-Tuned Layer
Consider a prefix-tuned Transformer layer where the full input
H^lis composed of prefix vectors followed by the original input's hidden states. The output passed to the subsequent layer,overline{H}^{l+1}, is correctly obtained by applying the layer's transformation only to the hidden states corresponding to the original input, ignoring the prefix vectors during the computation.
Learn After
In a specific parameter-efficient tuning method, each layer of a transformer is adapted by prepending a sequence of new, trainable vectors to the sequence of hidden states from the previous layer. Suppose for a given layer, the sequence of these new trainable vectors has a length of 20, and the sequence of hidden states corresponding to the original text input has a length of 128. After this layer processes the combined sequence, a new set of hidden states is generated. How is the complete hidden state sequence for the next layer constructed?
Analyzing an Incorrect Hidden State Composition
Constructing the Input Hidden State for a Prefix-Tuned Layer