Most PLMs are trained on general-domain text corpora, If the target domain is dramatically different from a general domain, we might consider adapting the PLM using in-domain data by continual pre-training the selected general-domain PLM. For domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch might also be a good choice.

Saint Olaf College

Depends greatly on the nature of the target task and domain, the availability of in-domain labels, the latency and capacity constraints of the applications, and so on. Has 5 steps.

How to choose the best neural network model for my task.

The use of PLMs leads to significant improvements across all popular text classification tasks, and autoencoding PLMs (e.g., BERT or RoBERTa) often work better than autoregressive PLMs (e.g., OpenAI GPT). Hugging Face maintains a rich repository of PLMs developed for various tasks and settings.

 PLM Selection

Domain adaptation

+ PLM produces a sequence of vectors in the contextual
representation. 
+ Then, one or more task-specific layers are added on the top to generate the final output for the target task. 
+ The choice of the architecture of task-specific layers depends on the nature of the task, e.g., the linguistic structure of text needs to be captured. 
+ RNNs can capture word orders, CNNs are good at recognizing patterns such as key phrases, attention mechanisms are effective to identify correlated words in the text, Siamese NNs are used for text matching tasks, and GNNs can be a good choice if graph structures of natural language (e.g., parse trees) are useful for the target task.

Task-specific model design

PLMs are expensive to serve. They often need to be compressed via e.g., knowledge
distillation to meet the latency and capacity constraints in real-world applications.

Learn Before

Related