Output Selection Formula in a Prefix-Tuned Transformer Layer
The selection of the last hidden states in a prefix-tuned Transformer layer is expressed mathematically. The output for the next layer, , is derived by applying the layer's transformation to the full input and then slicing the resulting sequence to retain only the final vectors. The formula is: where denotes the slicing operation.

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Output Selection Formula in a Prefix-Tuned Transformer Layer
Inter-Layer Data Flow in Prefix-Tuning
Consequences of Output Selection in a Modified Transformer
In a Transformer layer adapted for prefix-tuning, the input consists of a set of trainable prefix vectors followed by the hidden states from the original input sequence. After this combined input is processed by the layer, the resulting hidden states corresponding to the prefix vectors are discarded, and only the states for the original sequence are passed on. What is the most critical reason for this selective output process?
In a Transformer architecture modified for prefix-tuning, the hidden state representations corresponding to the trainable prefix vectors are passed along with the main input's hidden states to the subsequent layer to ensure the model has access to the learned task-specific information at every stage.
Output Selection Formula in a Prefix-Tuned Transformer Layer
A data processing script needs to extract the final portion of a sequence of numerical values. Given the sequence
V = [15, 30, 45, 60, 75, 90, 105, 120], what is the output of the operationV[-5:]?Generating a Sequence Slice
Code Snippet Review for Sequence Slicing
Learn After
A Transformer layer adapted for a specific fine-tuning method receives a combined input sequence. This input is created by prepending 20 trainable vectors to a sequence of 128 hidden states from the previous layer. After processing this combined sequence of 148 vectors, the layer produces a full set of 148 output hidden states. Which portion of this full output is selected to be passed on to the next layer in the network?
Calculating the Output Slice in Prefix-Tuning
Composition of Hidden States in a Prefix-Tuned Layer
Consider a prefix-tuned Transformer layer where the full input
H^lis composed of prefix vectors followed by the original input's hidden states. The output passed to the subsequent layer,overline{H}^{l+1}, is correctly obtained by applying the layer's transformation only to the hidden states corresponding to the original input, ignoring the prefix vectors during the computation.