Calculating the Output Slice in Prefix-Tuning
A Transformer layer is configured for a fine-tuning method where a set of trainable vectors is prepended to the main input sequence. In a specific case, 32 trainable vectors are prepended to an original sequence of 512 hidden state vectors. The layer processes this combined sequence and produces a full output tensor, which we will call H_full_output.
- Write the specific Python-style slicing expression needed to select the correct portion of
H_full_outputto be passed to the next layer. - Briefly explain the purpose of this selection process.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A Transformer layer adapted for a specific fine-tuning method receives a combined input sequence. This input is created by prepending 20 trainable vectors to a sequence of 128 hidden states from the previous layer. After processing this combined sequence of 148 vectors, the layer produces a full set of 148 output hidden states. Which portion of this full output is selected to be passed on to the next layer in the network?
Calculating the Output Slice in Prefix-Tuning
Composition of Hidden States in a Prefix-Tuned Layer
Consider a prefix-tuned Transformer layer where the full input
H^lis composed of prefix vectors followed by the original input's hidden states. The output passed to the subsequent layer,overline{H}^{l+1}, is correctly obtained by applying the layer's transformation only to the hidden states corresponding to the original input, ignoring the prefix vectors during the computation.