Learn Before
Cross-Lingual Transfer from High-Resource to Low-Resource Languages
A significant advantage of multilingual pre-training is the ability for low-resource languages to benefit from knowledge transfer from high-resource languages. This transfer is particularly effective when the high-resource languages included in the training data are linguistically similar to the target low-resource language.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Cross-Lingual Text Classification Example
Cross-Lingual Transfer from High-Resource to Low-Resource Languages
A development team has a large, high-quality dataset for sentiment analysis in English. They need to create a similar sentiment analysis tool for Swahili, a language for which they have very little labeled data. The team has access to a powerful multilingual model pre-trained on a corpus including both English and Swahili. Based on the principles of leveraging knowledge from a data-rich language for a data-poor one, what is the most direct and effective strategy for the team to pursue?
Analyzing a Cross-Lingual Model Implementation Failure
Explaining Zero-Shot Cross-Lingual Transfer
Learn After
A machine learning team is tasked with creating a text classification model for the Malagasy language, which has a very small amount of available training data. The team decides to leverage a large, pre-trained multilingual model and then fine-tune it on their limited Malagasy dataset. To maximize the effectiveness of this approach, which pre-training strategy for the multilingual model should they prioritize?
Selecting a Pre-trained Model for a Low-Resource Language
Critique of a Model Training Strategy