1Cademy - Architecture of Prefix Tuning

Learn Before

Prefix Tuning

Concept

Architecture of Prefix Tuning

The architecture of prefix tuning involves augmenting a standard transformer model at each layer. As illustrated in the diagram, a sequence of trainable prefix vectors (e.g., $\mathbf{p}_0^l, \mathbf{p}_1^l$ ) is prepended to the sequence of hidden states from the user input (e.g., $\mathbf{h}_0^l, \mathbf{h}_1^l, \dots$ ) at every layer $l$ . While the main language model's weights are frozen, only these prefix vectors are updated during training. The final hidden states are used to generate predictions, and the loss is backpropagated to optimize the prefixes for the specific task.