Learn Before
A research team is adapting a pre-trained language model to handle tasks requiring the analysis of very long documents, such as legal contracts. Their strategy involves modifying the model's architecture for greater efficiency. Arrange the following steps in the correct chronological order to execute this adaptation method.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating Model Adaptation Strategies for Long-Context Tasks
A team is adapting a language model, originally pre-trained with a standard full attention mechanism, to handle tasks involving extremely long text sequences. Their strategy is to replace the full attention with a more computationally efficient sparse attention mechanism and then fine-tune the model on their long-context dataset. What is the primary reason for using the original model's parameters to initialize this new sparse-attention model, instead of starting the fine-tuning process with randomly initialized parameters?
A research team is adapting a pre-trained language model to handle tasks requiring the analysis of very long documents, such as legal contracts. Their strategy involves modifying the model's architecture for greater efficiency. Arrange the following steps in the correct chronological order to execute this adaptation method.