1Cademy - An engineer is adapting a pre-trained language model for a new task. They want to add a small number of trainable vectors to guide the models behavior without changing any of the original model weights. What is the fundamental architectural difference between a strategy that adds these vectors only to the input embedding layer versus one that adds them to the input of every transformer layer?

Learn Before

Comparison of Prompt Tuning and Prefix Fine-Tuning

Multiple Choice

An engineer is adapting a pre-trained language model for a new task. They want to add a small number of trainable vectors to guide the model's behavior without changing any of the original model weights. What is the fundamental architectural difference between a strategy that adds these vectors only to the input embedding layer versus one that adds them to the input of every transformer layer?

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related