Multilingual and Language-Specific PTMs
Learning multilingual text representations shared across languages plays an important role in many cross-lingual NLP tasks. Examples:
-
Multilingual BERT3 (mBERT): It is pre-trained by MLM with the shared vocabulary and weights on Wikipedia text from the top 104 languages. Each training sample is a monolingual document, and there are no cross-lingual objectives specifically designed nor any cross-lingual data. Even so, mBERT performs cross-lingual generalization surprisingly well.
-
Cross-Lingual Language Model (XLM): XLM improves mBERT by incorporating a cross-lingual task, translation language modeling (TLM), which performs MLM on a concatenation of parallel bilingual sentence pairs.
Although multilingual PTMs perform well on many languages, recent work showed that PTMs trained on a single language significantly outperform the multilingual results. For Chinese, which does not have explicit word boundaries, modeling larger granularity and multigranularity word representations have shown great success. Some monolingual PTMs have been released for different languages, such as CamemBERT and FlauBERT for French, Fin-BERT for Finnish, BERTje and RobBERT for Dutch, AraBERT for Arabic language.
0
1
Contributors are:
Who are from:
Tags
Data Science
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Knowledge-Enriched PTMs
Multilingual and Language-Specific PTMs
Multi-Modal PTMs
Compression of Pre-trained Models
Multi-lingual BERT (mBERT)
Multilingual and Language-Specific PTMs
Language Model Development Strategy
A startup with limited computational resources is developing a feature to classify customer support tickets across 20 different languages. Several of these are low-resource languages with small datasets. Considering the trade-offs between performance, cost, and data availability, which strategy for building the underlying language model is most advisable?
A team is developing a natural language processing system for a global audience. They are considering two different strategies for handling multiple languages. Match each strategy with its most significant trade-off.
Learn After
Cross-Lingual Learning
Bilingual Pre-training for Multilingual Models
Benefit of Multilingual Pre-trained Models: Handling Code-Switching
Shared Vocabulary in Multilingual Models
Factors Influencing Multilingual Pre-training
A company is developing a sentiment analysis tool. Their primary market is in France, for which they have a massive, high-quality dataset. They also need to provide functional support for Spanish and German, but have very limited data for these languages. The highest priority is achieving state-of-the-art performance for the French market, while still being able to handle the other languages. Given these requirements, which strategy for choosing a foundational model is most appropriate?
Model Selection for a Monolingual Task
Match each pre-trained model with the description that best characterizes its training methodology and primary use case.