Learn Before
Diagnosing Training Data Issues for a Bilingual Model
A data scientist is training a model to translate between Chinese and English, specifically focusing on financial terms. The model is consistently failing to correctly translate the Chinese word for a financial institution: '银行' (yínháng). After inspecting the training data, the scientist suspects a poorly aligned sentence pair is causing the issue. Review the following data samples and identify the problematic pair. Explain why this specific pair would confuse the model and hinder its learning process for the target word.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of a Packed Bilingual Sentence Sequence
A machine learning model is being trained to understand the relationship between sentences in two different languages. Which of the following pairs of sentences represents the highest-quality, most precisely aligned example for this training process?
Diagnosing Training Data Issues for a Bilingual Model
A key step in training a model to understand multiple languages is to provide it with correctly matched, or 'aligned,' sentence pairs. Match each English sentence with its direct Chinese translation to form a set of aligned pairs.