Decide on a pipeline structure based on available data
Case context: You are building a system for autonomous vehicles. You have access to a massive dataset of bounding boxes around pedestrians, but only a moderate amount of data mapping raw camera pixels directly to steering commands.
Question: Based on the availability of data for intermediate modules, what structural decision should you make for your machine learning pipeline, and why?
Sample answer: I should consider using a multi-stage pipeline. The massive dataset of bounding boxes around pedestrians is an example of having a lot of data available for an intermediate module (a pedestrian detector). Because this structure allows me to use all that available data to effectively train the pedestrian detector, it could be superior to attempting an end-to-end approach with limited direct end-to-end data.
Key points:
- Bounding box data serves as intermediate module data
- A multi-stage pipeline is the recommended structure
- The structure uses the available data to train the intermediate detector
Rubric: The learner must identify the bounding box data as intermediate module data and conclude that a multi-stage pipeline is the correct choice, citing the ability to use that data for training.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Machine Learning Yearning @ DeepLearning.AI
Related
Autonomous Driving Data Availability Favors Intermediate Detectors
Why consider a multi-stage pipeline?
Is a multi-stage pipeline structure potentially superior?
Available data for _____ in a pipeline
Match multi-stage pipeline concepts
Sequence the decision process for a multi-stage pipeline
Analyze the impact of intermediate module data
Decide on a pipeline structure based on available data
What makes a multi-stage pipeline superior?
Which is an intermediate module?
Does limited data favor multi-stage pipelines?