Inconsistent Auxiliary Data Source
An auxiliary data source is inconsistent with the target task when the same input features can imply different labels depending on the data source. If one only wants to predict New York City housing prices, Detroit housing data is inconsistent because the same house size can have a very different price in the two cities, so mixing the datasets would hurt performance.
0
1
References
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
Adding More Training Data Does Not Always Help
Special Challenges from Different Training and Dev/Test Distributions
Risk of Merging Training Data Sources Depends on Algorithm Flexibility
Shared Label Mapping Across Data Sources
Training and Dev/Test Sets from Different Distributions
Inconsistent Auxiliary Data Source
Approximating Future Dev/Test Data Before Launch
Updating Dev/Test Sets with Actual User Data After Launch
Risk of Starting with Website Images When Future-Like Data Is Unavailable
Development Investment for Dev and Test Sets Requires Judgment
According to Machine Learning Yearning, what is the primary criterion for choosing dev and test sets?
True or False: When building a dev/test set, it is safe to assume the training distribution is the same as the test distribution.
Dev and test sets should contain examples that reflect what you ultimately want to perform well on, rather than only the _____ you happen to have for training.
Why is using a simple 30% random split of available data as your test set problematic when future data differs from training data?
According to ML Yearning, it is generally safe to assume your training data distribution is the same as your test data distribution.
Dev and test sets should be chosen to reflect data you expect to get in the _____ and want to do well on.
Match each dev/test set concept from ML Yearning to its correct description.
Order the steps for correctly choosing dev and test sets according to ML Yearning's guidance.
According to ML Yearning, what should the examples in your dev and test sets primarily reflect?
According to ML Yearning, dev and test sets must always come from the same distribution as the training data.
ML Yearning warns that the test set should not simply be _____ of the available data when future data differs from the training set.
Match each data scenario to the correct dev/test set strategy decision according to ML Yearning.
Order the reasoning steps for deciding whether a proposed dev/test set is well-chosen, per ML Yearning.
Why Standard Data Splits Fail With Different Future Distributions
Dev and Test Set Design for Mobile Image Applications
The Core Criterion for Dev and Test Set Selection
Learn After
Adding a Source Indicator Feature for Inconsistent Data
Effect of mixing inconsistent Detroit housing data when predicting NYC prices
Consistency of housing price data between NYC and Detroit
Handling _____ auxiliary data in target task training
Terms related to inconsistent auxiliary data sources
Decision process for evaluating auxiliary data consistency
When is an auxiliary data source inconsistent with the target task?
Performance impact of mixing inconsistent datasets
Relative pricing of Detroit housing compared to _____ prices
Matching scenarios with their consistency classification
Sequence explaining why mixing Detroit and NYC data hurts performance