Next-Layer Input Composition Formula in Prefix-Tuning
In a prefix-tuned model, the complete hidden state for layer , denoted as , is formed by concatenating the layer-specific trainable prefix vectors with the processed hidden states of the original input sequence. This composition is represented by the formula: where is the sequence of output hidden states corresponding to the original input, which can be further expanded as:

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A Transformer layer adapted for a specific fine-tuning method receives a combined input sequence. This input is created by prepending 20 trainable vectors to a sequence of 128 hidden states from the previous layer. After processing this combined sequence of 148 vectors, the layer produces a full set of 148 output hidden states. Which portion of this full output is selected to be passed on to the next layer in the network?
Calculating the Output Slice in Prefix-Tuning
Consider a prefix-tuned Transformer layer where the full input
H^lis composed of prefix vectors followed by the original input's hidden states. The output passed to the subsequent layer,overline{H}^{l+1}, is correctly obtained by applying the layer's transformation only to the hidden states corresponding to the original input, ignoring the prefix vectors during the computation.Next-Layer Input Composition Formula in Prefix-Tuning
In a multi-layer transformer model adapted for prefix-based tuning, the input to any given layer
Lis formed by prepending a set of layer-specific trainable vectors (the 'prefix') to the sequence representation from the previous layer. After all computations within layerLare finished, what is the precise composition of the input sequence for the next layer,L+1?A single layer in a multi-layer model has been adapted for a tuning method where a set of trainable vectors (a 'prefix') is used. Arrange the following steps to accurately describe the complete data flow from the moment data enters this single layer until it is passed to the next.
Multi-Layer Input Composition in Prefix-Tuning
Next-Layer Input Composition Formula in Prefix-Tuning
Learn After
In a specific parameter-efficient tuning method, each layer of a transformer is adapted by prepending a sequence of new, trainable vectors to the sequence of hidden states from the previous layer. Suppose for a given layer, the sequence of these new trainable vectors has a length of 20, and the sequence of hidden states corresponding to the original text input has a length of 128. After this layer processes the combined sequence, a new set of hidden states is generated. How is the complete hidden state sequence for the next layer constructed?
Analyzing an Incorrect Hidden State Composition
Constructing the Input Hidden State for a Prefix-Tuned Layer