Learn Before
Information Flow in a Multi-Layer Tuning Process
Consider a diagram illustrating a parameter-efficient tuning method for a large language model. For an arbitrary layer l in the model, the diagram shows a sequence of new, trainable vectors being introduced. These vectors are combined with the sequence of hidden states passed from the previous layer (l-1). This combined sequence then serves as the input to the main computational block of layer l. Based on this described process, explain two key aspects:
- What is the specific operation used to combine the new trainable vectors with the hidden states from the previous layer?
- During the training process, which set of parameters is modified to minimize the task-specific error: the new trainable vectors, the original weights of the main computational block of layer
l, or both?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Consider a large, pre-trained language model being adapted for a specific task. During this adaptation process, a small sequence of new, trainable vectors is prepended to the input of each transformer layer. The original weights of the pre-trained model are not modified. The training objective is to minimize a task-specific loss by only updating the parameters of these newly introduced vector sequences. Which statement best analyzes how this adaptation method functions?
Information Flow in a Multi-Layer Tuning Process