1Cademy - Performance Degradation due to Interference in Bilingual Pre-training

Learn Before

Bilingual Pre-training for Multilingual Models

Concept

Performance Degradation due to Interference in Bilingual Pre-training

During the pre-training of a bilingual model, a phenomenon known as interference can occur. This means that after a certain amount of training, the model's overall performance may begin to decline rather than continue improving.

Updated 2025-08-29

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Cross-Lingual Language Models (XLM)
Bilingual Sentence Packing for Pre-training
Performance Degradation due to Interference in Bilingual Pre-training
An NLP team is developing a model for a Spanish-to-Portuguese translation service. They are considering two different pre-training strategies before fine-tuning the model on a specific translation dataset.

Strategy 1: The model is trained on a large corpus containing millions of Spanish documents and a separate, equally large corpus of Portuguese documents. During each training step, the model processes text from only one of the two languages.

Strategy 2: The model is trained on a large corpus of Spanish sentences that have been professionally translated into Portuguese. During each training step, the model processes a Spanish sentence and its corresponding Portuguese translation together.

Which statement best analyzes the likely effectiveness of these two strategies for the final translation task?
Analyzing Pre-training Strategies for Multilingual Models
Pre-training Strategy for Zero-Shot Cross-Lingual Transfer

Learn After

Early Stopping as a Mitigation for Interference in Bilingual Pre-training

Learn Before

Related

Learn After