Concept

Transfer knowledge of a PTM to the downstream NLP tasks

  • Choose appropriate pre-training task, model architecture and corpus. Currently, the language model is the most popular pretraining task and can more efficiently solve a wide range of NLP problems. However, different pre-training tasks have their own bias and give different effects for different tasks. Besides, the data distribution of the downstream task should be approximate to PTMs.

  • Choose appropriate layers. Given a pre-trained deep model, different layers should capture different kinds of information, such as POS tagging, parsing, long-term dependencies, semantic roles, coreference.

  • There are two common ways of model transfer: feature extraction (where the pre-trained parameters are frozen), and fine-tuning (where the pre-trained parameters are unfrozen and fine-tuned). In feature extraction way, the pre-trained models are regarded as off-the-shelf feature extractors. Moreover, it is important to expose the internal layers as they typically encode the most transferable representations.

0

1

Updated 2022-05-29

Tags

Data Science

Related