Learn Before
Transfer Learning to overcome Data Sparsity
Transfer Learning uses information from related task or language or domain to acquire knowledge which can be used in Language Models. The focus lies on Pre-trained Language models, Domain Specific Pre-training and Multilingual Language Models.
Pre-trained Language Representation: Unlabeled data is used for self-supervised pre-training. One of the most popular approach is Masked Language Modeling like BERT. Being pre-trained on massive amount of unlabeled data, these models can be used as starting point for many other NLP applications, therefore useful in low-resource scenarios as these models help in transferring general pre-training knowledge.
Some open issues with this approach: Pre-training is expensive due to high data requirement for the model. In low resource scenario, training a BERT model from scratch is not ideal. Therefore, it is essential that domain-specific pre-trained models are available for the specified task.
Domain-specific Pre-training: A Specialized Domain Language differs vastly from the Standard Language. For example, linguistics in scientific article is totally different from a standard news article. There exists ‘Domain Gap’ between pre-trained domain and target domain An example of language model for Scientific Literature is SciBERT and Biomedical Literature is BioBERT Therefore, powerful representations can be achieved by pre-training and fine-tuning the model on target domain.
Multilingual Language Models: Combining resources from multiple language to build a large multilingual model Provides advantages like Zero-shot transfer for cross-lingual settings. (Can transfer model to low resource target language) Some open issues with this approach: These multilingual models are not truly universal language models. Many low-resource languages are not covered or well presented enough.
0
1
Tags
Natural language processing
Data Science