Learn Before
Bilingual Pre-training for Multilingual Models
A significant improvement for multilingual pre-trained models, such as mBERT, involves incorporating bilingual data into the pre-training process. In contrast to training on separate monolingual corpora, this approach explicitly models the relationships between tokens from two different languages. This method equips the model with inherent cross-lingual transfer abilities, making it more readily adaptable to new languages.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Cross-Lingual Learning
Bilingual Pre-training for Multilingual Models
Benefit of Multilingual Pre-trained Models: Handling Code-Switching
Shared Vocabulary in Multilingual Models
Factors Influencing Multilingual Pre-training
A company is developing a sentiment analysis tool. Their primary market is in France, for which they have a massive, high-quality dataset. They also need to provide functional support for Spanish and German, but have very limited data for these languages. The highest priority is achieving state-of-the-art performance for the French market, while still being able to handle the other languages. Given these requirements, which strategy for choosing a foundational model is most appropriate?
Model Selection for a Monolingual Task
Match each pre-trained model with the description that best characterizes its training methodology and primary use case.
Learn After
Cross-Lingual Language Models (XLM)
Bilingual Sentence Packing for Pre-training
Performance Degradation due to Interference in Bilingual Pre-training
An NLP team is developing a model for a Spanish-to-Portuguese translation service. They are considering two different pre-training strategies before fine-tuning the model on a specific translation dataset.
Strategy 1: The model is trained on a large corpus containing millions of Spanish documents and a separate, equally large corpus of Portuguese documents. During each training step, the model processes text from only one of the two languages.
Strategy 2: The model is trained on a large corpus of Spanish sentences that have been professionally translated into Portuguese. During each training step, the model processes a Spanish sentence and its corresponding Portuguese translation together.
Which statement best analyzes the likely effectiveness of these two strategies for the final translation task?
Analyzing Pre-training Strategies for Multilingual Models
Pre-training Strategy for Zero-Shot Cross-Lingual Transfer