Learn Before
Concept

Architecture of Prefix Tuning

The architecture of prefix tuning involves augmenting a standard transformer model at each layer. As illustrated in the diagram, a sequence of trainable prefix vectors (e.g., p0l,p1l\mathbf{p}_0^l, \mathbf{p}_1^l) is prepended to the sequence of hidden states from the user input (e.g., h0l,h1l,\mathbf{h}_0^l, \mathbf{h}_1^l, \dots) at every layer ll. While the main language model's weights are frozen, only these prefix vectors are updated during training. The final hidden states are used to generate predictions, and the loss is backpropagated to optimize the prefixes for the specific task.

Image 0

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences