Learn Before
Illustration of Prefix Fine-Tuning
Prefix fine-tuning can be illustrated through tasks like translation. During training, sequences of trainable prefix vectors, such as and , are prepended to the beginning of each Transformer layer and are updated by receiving error gradients from the output. By adjusting these vectors, the model adapts to the specific task, allowing the prefixes to serve as prompts that activate the Large Language Model (LLM) without needing explicit text instructions (e.g., "Translate the following sentence"). At test time, the optimized prefix vectors are prepended to the layers to successfully perform the task.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Related
A researcher is adapting a large pre-trained language model for a new task. Instead of modifying the model's original parameters, they introduce a small set of new, trainable vectors. These vectors are prepended to the sequence of hidden states at the input of every transformer layer. During training, only these new vectors are updated. Which statement best analyzes the primary impact of this technique on the model's computation?
Improving a Parameter-Efficient Fine-Tuning Strategy
An engineer is adapting a large language model for a specialized task by introducing a set of trainable vectors. These vectors are prepended to the sequence of hidden states at the input of every layer in the model. During the adaptation process, the original model parameters remain unchanged, and only these new vectors are optimized. What is the most significant advantage of this specific approach compared to a method that only adds trainable vectors at the initial input layer?
Illustration of Prefix Fine-Tuning
Learn After
Consider a large, pre-trained language model being adapted for a specific task. During this adaptation process, a small sequence of new, trainable vectors is prepended to the input of each transformer layer. The original weights of the pre-trained model are not modified. The training objective is to minimize a task-specific loss by only updating the parameters of these newly introduced vector sequences. Which statement best analyzes how this adaptation method functions?
Information Flow in a Multi-Layer Tuning Process