An engineer is adapting a pre-trained language model for a new task. They want to add a small number of trainable vectors to guide the model's behavior without changing any of the original model weights. What is the fundamental architectural difference between a strategy that adds these vectors only to the input embedding layer versus one that adds them to the input of every transformer layer?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Analysis of a Model Adaptation Strategy
An engineer is adapting a pre-trained language model for a new task. They want to add a small number of trainable vectors to guide the model's behavior without changing any of the original model weights. What is the fundamental architectural difference between a strategy that adds these vectors only to the input embedding layer versus one that adds them to the input of every transformer layer?
Match each parameter-efficient adaptation method to the description of how it modifies a pre-trained model's architecture.