1Cademy - Diagnosing Training Data Issues for a Bilingual Model

Learn Before

Example of an Aligned Bilingual Sentence Pair

Case Study

Diagnosing Training Data Issues for a Bilingual Model

A data scientist is training a model to translate between Chinese and English, specifically focusing on financial terms. The model is consistently failing to correctly translate the Chinese word for a financial institution: '银行' (yínháng). After inspecting the training data, the scientist suspects a poorly aligned sentence pair is causing the issue. Review the following data samples and identify the problematic pair. Explain why this specific pair would confuse the model and hinder its learning process for the target word.

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related