Consider a prefix-tuned Transformer layer where the full input H^l is composed of prefix vectors followed by the original input's hidden states. The output passed to the subsequent layer, overline{H}^{l+1}, is correctly obtained by applying the layer's transformation only to the hidden states corresponding to the original input, ignoring the prefix vectors during the computation.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A Transformer layer adapted for a specific fine-tuning method receives a combined input sequence. This input is created by prepending 20 trainable vectors to a sequence of 128 hidden states from the previous layer. After processing this combined sequence of 148 vectors, the layer produces a full set of 148 output hidden states. Which portion of this full output is selected to be passed on to the next layer in the network?
Calculating the Output Slice in Prefix-Tuning
Composition of Hidden States in a Prefix-Tuned Layer
Consider a prefix-tuned Transformer layer where the full input
H^lis composed of prefix vectors followed by the original input's hidden states. The output passed to the subsequent layer,overline{H}^{l+1}, is correctly obtained by applying the layer's transformation only to the hidden states corresponding to the original input, ignoring the prefix vectors during the computation.