Case Study

Diagnosing Training Data Issues for a Bilingual Model

A data scientist is training a model to translate between Chinese and English, specifically focusing on financial terms. The model is consistently failing to correctly translate the Chinese word for a financial institution: '银行' (yínháng). After inspecting the training data, the scientist suspects a poorly aligned sentence pair is causing the issue. Review the following data samples and identify the problematic pair. Explain why this specific pair would confuse the model and hinder its learning process for the target word.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science