1Cademy - Input Composition in a Prefix-Tuned Transformer Layer

Learn Before

Formula

Input Composition in a Prefix-Tuned Transformer Layer

In prefix fine-tuning, the input sequence for a given layer $l$ , denoted as $\mathbf{H}^{l}$ , is constructed by prepending a sequence of trainable prefix vectors before the hidden state outputs from the previous layer. The formula for this composition is: $\mathbf{H}^l = \underbrace{\mathbf{p}_0^l\ \mathbf{p}_1^l\ ...\ \mathbf{p}_n^l}_{\text{trainable}} \underbrace{\mathbf{h}_0^l\ \mathbf{h}_1^l\ ...\ \mathbf{h}_m^l}_{\text{previous layer output}}$ Here, $\mathbf{p}_0^l$ to $\mathbf{p}_n^l$ are the trainable prefix vectors specific to layer $l$ , and $\mathbf{h}_0^l$ to $\mathbf{h}_m^l$ represent the selected hidden states from the output of the preceding layer.

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After