Case Study

Decide on a pipeline structure based on available data

Case context: You are building a system for autonomous vehicles. You have access to a massive dataset of bounding boxes around pedestrians, but only a moderate amount of data mapping raw camera pixels directly to steering commands.

Question: Based on the availability of data for intermediate modules, what structural decision should you make for your machine learning pipeline, and why?

Sample answer: I should consider using a multi-stage pipeline. The massive dataset of bounding boxes around pedestrians is an example of having a lot of data available for an intermediate module (a pedestrian detector). Because this structure allows me to use all that available data to effectively train the pedestrian detector, it could be superior to attempting an end-to-end approach with limited direct end-to-end data.

Key points:

  • Bounding box data serves as intermediate module data
  • A multi-stage pipeline is the recommended structure
  • The structure uses the available data to train the intermediate detector

Rubric: The learner must identify the bounding box data as intermediate module data and conclude that a multi-stage pipeline is the correct choice, citing the ability to use that data for training.

0

1

Updated 2026-06-13

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI