Learn Before
Essay

Explaining Zero-Shot Cross-Lingual Transfer

A large multilingual model is pre-trained on a massive corpus of text from over 100 different languages. Importantly, the pre-training process only uses monolingual documents; the model never sees parallel sentences (e.g., an English sentence and its direct French translation) during this phase. After pre-training, the model is fine-tuned for a sentiment analysis task using only English-language data. Surprisingly, when this fine-tuned model is tested on German-language reviews, it performs significantly better than random chance. Analyze and explain the key mechanisms and properties of the multilingual pre-training process that enable this successful cross-lingual transfer, despite the absence of explicit cross-lingual training data.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science