Learn Before
Impact of Incorrect Ground-Truth Labels
A dataset for a translation task contains the following training sample, formatted as 'input → target':
translate English to Spanish: The cat is black → El perro es negro
Analyze the potential negative impact on a model's learning process if it is trained on a significant number of such samples where the target text is an incorrect translation of the input text. What specific incorrect associations might the model learn from this example?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A text-to-text model is being trained on the following data sample formatted as 'input → output':
summarize: The solar system consists of the Sun and the astronomical objects gravitationally bound to it. Of the eight planets, the four inner terrestrial planets are Mercury, Venus, Earth, and Mars, and the four outer giant planets are Jupiter, Saturn, Uranus, and Neptune. → The solar system has eight planets, divided into inner terrestrial and outer giant groups.Which part of this sample represents the correct, or ground-truth, label that the model is expected to learn to produce?
Analyzing Training Data Quality
Impact of Incorrect Ground-Truth Labels