Learn Before
Architecture of Prefix Tuning
The architecture of prefix tuning involves augmenting a standard transformer model at each layer. As illustrated in the diagram, a sequence of trainable prefix vectors (e.g., ) is prepended to the sequence of hidden states from the user input (e.g., ) at every layer . While the main language model's weights are frozen, only these prefix vectors are updated during training. The final hidden states are used to generate predictions, and the loss is backpropagated to optimize the prefixes for the specific task.

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Architecture of Prefix Tuning
A team is tasked with adapting a very large, pre-trained language model for a specialized legal document analysis task. To conserve computational resources and avoid altering the base model, they freeze all of the original model's parameters. They then introduce a small set of new, trainable parameters that are prepended to the sequence of hidden states within each transformer layer. During training for the new task, only these new parameters are updated. Which statement best analyzes the main consequence of this specific training strategy?
Choosing a Fine-Tuning Strategy
Analyzing the Mechanism of Prefix Tuning
Learn After
A large language model is being adapted to a new task using the prefix tuning method. During the backpropagation phase of training, which components of the model architecture receive gradient updates?
A researcher is comparing two different methods for adapting a pre-trained transformer model, keeping the original model weights frozen. Method A prepends a sequence of trainable vectors to the input sequence before it enters the first layer. Method B prepends a sequence of trainable vectors to the sequence of hidden states at each layer of the model. Which statement best analyzes the architectural difference in how these methods influence the model's processing?
Layer-wise Influence of Prefixes