Diagnosing a Realism Discrepancy in Synthesized Training Data
Case context: A machine learning practitioner is using artificial data synthesis to expand their training set. They generate synthetic data that, upon human inspection, appears completely realistic. However, when they train their model on this synthetic data, the model fails to perform well, suggesting the data is not realistic from the computer's perspective.
Question: Based on the concept of realism in artificial data synthesis, diagnose the issue with the synthesized data. What decision should the practitioner make regarding their validation process?
Sample answer: The practitioner is encountering the challenge where synthetic data appears realistic to a person (human realism) but not to a computer (computer realism). Although the data passes human inspection, it contains statistical discrepancies or lacks properties that the computer model expects. The practitioner should decide to update their validation process so that it does not rely solely on human judgment, but instead evaluates the realism of the synthesized data from the computer's perspective.
Key points:
- Identify that the synthetic data lacks realism from the computer's perspective.
- Explain that data appearing realistic to a person does not guarantee it appears realistic to a computer.
- Decide to modify the validation process to assess realism from the computer's perspective rather than relying solely on human inspection.
Rubric: The response should diagnose that the data lacks computer realism despite having human realism, and state that the practitioner must decide to evaluate realism from the computer's perspective rather than relying only on human inspection.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Machine Learning Yearning @ DeepLearning.AI
Related
What is the key challenge of artificial data synthesis identified in Machine Learning Yearning?
Synthetic data that appears realistic to a human always appears realistic to a computer as well.
It is sometimes easier to create synthetic data that appears realistic to a _____ than to a computer.
Match each concept to its correct description in the context of the computer realism challenge in data synthesis.
Order the reasoning steps a practitioner should follow when assessing whether synthetic data is suitable for model training.
A team synthesizes car-noise audio that human listeners rate as convincing. What should they do before adding it to training?
Creating synthetic data that appears realistic to a computer is generally harder than creating data that appears realistic to a human.
Synthetic data can appear realistic to a person without appearing _____ to a computer.
Match each scenario to the realism concept it best illustrates: human realism, computer realism, or the gap between them.
Order the steps a team should follow to address the computer realism challenge when incorporating synthetic data into training.
Analyzing the Discrepancy in Human versus Computer Realism for Synthetic Data
Diagnosing a Realism Discrepancy in Synthesized Training Data
Limits of Human Inspection in Verifying Synthesized Data Realism