Learn Before
A research team is pre-training a multilingual language model on a dataset containing text from 50 languages. After training, they observe that the model's performance on Swahili, a language with relatively little data in the training set, is significantly worse than its performance on high-resource languages like English and Spanish. Assuming the model architecture is sound, which of the following configuration choices is the most likely cause of this performance disparity?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Scaling Considerations for Multilingual Models
Interference in Multilingual Models
Evaluating a Multilingual Pre-training Strategy
A research team is pre-training a multilingual language model on a dataset containing text from 50 languages. After training, they observe that the model's performance on Swahili, a language with relatively little data in the training set, is significantly worse than its performance on high-resource languages like English and Spanish. Assuming the model architecture is sound, which of the following configuration choices is the most likely cause of this performance disparity?
A team of researchers is developing a multilingual language model and encounters several performance issues. Match each observed issue with the most likely underlying configuration factor that needs adjustment, assuming the model's architecture is fixed.