Learn Before
Scaling Considerations for Multilingual Models
Research on large-scale multilingual models, such as XLM-like architectures, has revealed important scaling principles. As the number of languages the model must support grows, a corresponding increase in model size is necessary for effective performance. Similarly, a larger shared vocabulary is beneficial to accommodate the linguistic diversity of an expanded set of languages.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Scaling Considerations for Multilingual Models
Interference in Multilingual Models
Evaluating a Multilingual Pre-training Strategy
A research team is pre-training a multilingual language model on a dataset containing text from 50 languages. After training, they observe that the model's performance on Swahili, a language with relatively little data in the training set, is significantly worse than its performance on high-resource languages like English and Spanish. Assuming the model architecture is sound, which of the following configuration choices is the most likely cause of this performance disparity?
A team of researchers is developing a multilingual language model and encounters several performance issues. Match each observed issue with the most likely underlying configuration factor that needs adjustment, assuming the model's architecture is fixed.
Learn After
A research lab has a successful multilingual model that performs well on 10 distinct languages. The team is now tasked with building a new version to support 100 languages. To manage computational costs, they propose keeping the new model's parameter count (size) and shared vocabulary size identical to the original 10-language model. Based on established scaling principles for such models, what is the most likely outcome of this strategy?
Multilingual Model Development Strategy
Justifying Scaling Decisions in Multilingual Model Development