Case Study

Diagnosing a Realism Discrepancy in Synthesized Training Data

Case context: A machine learning practitioner is using artificial data synthesis to expand their training set. They generate synthetic data that, upon human inspection, appears completely realistic. However, when they train their model on this synthetic data, the model fails to perform well, suggesting the data is not realistic from the computer's perspective.

Question: Based on the concept of realism in artificial data synthesis, diagnose the issue with the synthesized data. What decision should the practitioner make regarding their validation process?

Sample answer: The practitioner is encountering the challenge where synthetic data appears realistic to a person (human realism) but not to a computer (computer realism). Although the data passes human inspection, it contains statistical discrepancies or lacks properties that the computer model expects. The practitioner should decide to update their validation process so that it does not rely solely on human judgment, but instead evaluates the realism of the synthesized data from the computer's perspective.

Key points:

  • Identify that the synthetic data lacks realism from the computer's perspective.
  • Explain that data appearing realistic to a person does not guarantee it appears realistic to a computer.
  • Decide to modify the validation process to assess realism from the computer's perspective rather than relying solely on human inspection.

Rubric: The response should diagnose that the data lacks computer realism despite having human realism, and state that the practitioner must decide to evaluate realism from the computer's perspective rather than relying only on human inspection.

0

1

Updated 2026-05-27

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI