Learn Before
Prefix Tuning
Prefix tuning is a parameter-efficient fine-tuning (PEFT) technique for large language models. Instead of fine-tuning all the model's parameters, it keeps the original model frozen and introduces a small number of trainable vectors, called a prefix. This prefix is prepended to the sequence of hidden states at each transformer layer, and only these prefix parameters are optimized during training for a specific task. This approach allows the model to be adapted to new tasks by learning a small, task-specific prefix that steers the behavior of the larger frozen model.

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Transformer Encoder part:
Standard Transformer Encoding Procedure
Role of Positional Embeddings in Order-Insensitive Models
Key Hyperparameters of a Transformer Encoder
Transformer Encoding of a Masked Bilingual Sentence Pair
Prefix Tuning
In a sequence-to-sequence model, the input is processed by a stack of six encoder layers that have identical structures. A proposal is made to modify this architecture so that all six encoder layers share the exact same set of weights, with the goal of reducing the total number of model parameters. Which statement best analyzes the primary consequence of this change on the model's ability to process information?
A sentence is fed into the encoder side of a Transformer model. Arrange the following steps in the correct sequence to describe how the initial input is processed by the stack of encoders.
Improving a Transformer's Contextual Understanding
Learn After
Architecture of Prefix Tuning
A team is tasked with adapting a very large, pre-trained language model for a specialized legal document analysis task. To conserve computational resources and avoid altering the base model, they freeze all of the original model's parameters. They then introduce a small set of new, trainable parameters that are prepended to the sequence of hidden states within each transformer layer. During training for the new task, only these new parameters are updated. Which statement best analyzes the main consequence of this specific training strategy?
Choosing a Fine-Tuning Strategy
Analyzing the Mechanism of Prefix Tuning