1Cademy - Output Selection Formula in a Prefix-Tuned Transformer Layer

Learn Before

Output Selection in a Prefix-Tuned Transformer Layer
Sequence Slicing Notation for Last Elements

Formula

Output Selection Formula in a Prefix-Tuned Transformer Layer

The selection of the last $m+1$ hidden states in a prefix-tuned Transformer layer is expressed mathematically. The output for the next layer, $\overline{\mathbf{H}}^{l+1}$ , is derived by applying the layer's transformation to the full input $\mathbf{H}^l$ and then slicing the resulting sequence to retain only the final $m+1$ vectors. The formula is: $\overline{\mathbf{H}}^{l+1} = \mathrm{Layer}(\mathbf{H}^{l})[-m-1:] = \mathbf{h}_0^{l+1}\mathbf{h}_1^{l+1}... \mathbf{h}_m^{l+1}$ where $[-m-1:]$ denotes the slicing operation.

Updated 2026-05-02

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

A Transformer layer adapted for a specific fine-tuning method receives a combined input sequence. This input is created by prepending 20 trainable vectors to a sequence of 128 hidden states from the previous layer. After processing this combined sequence of 148 vectors, the layer produces a full set of 148 output hidden states. Which portion of this full output is selected to be passed on to the next layer in the network?
Calculating the Output Slice in Prefix-Tuning
Consider a prefix-tuned Transformer layer where the full input H^l is composed of prefix vectors followed by the original input's hidden states. The output passed to the subsequent layer, overline{H}^{l+1}, is correctly obtained by applying the layer's transformation only to the hidden states corresponding to the original input, ignoring the prefix vectors during the computation.
Next-Layer Input Composition Formula in Prefix-Tuning

Learn Before

Related

Learn After