Case Study

Addressing Low Training Data in a New Speech Recognition Application

Case context: You are building a new speech recognition application for a specialized domain, but you only have a very small dataset of audio recordings. Your team is deciding whether to use a pure end-to-end deep learning model or to incorporate hand-engineered components like MFCCs and phoneme representations.

Question: Based on the principles from Machine Learning Yearning, which approach should your team choose given the limited data, and how do the specific components justify this choice?

Sample answer: The team should incorporate hand-engineered components because having more of these components allows a system to learn with less data. Specifically, using MFCCs will help because they are robust to irrelevant properties like speaker pitch, simplifying the problem. Using phonemes will help the algorithm understand basic sound components. In low-data situations, this hand-engineered knowledge effectively supplements what the algorithm can learn from the small dataset.

Key points:

  • Use hand-engineered components due to limited data
  • MFCCs are robust to irrelevant properties like pitch
  • Phonemes help understand basic sound components
  • Hand-engineered knowledge supplements the limited data

Rubric: The learner must correctly diagnose that hand-engineered components should be used due to the low-data constraint. They must justify this by explaining that MFCCs are robust to irrelevant properties and phonemes help represent basic sounds, which together supplement the algorithm's data-driven learning.

0

1

Updated 2026-06-12

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI