1Cademy - Diagnosing Speech Recognition Performance in Car Audio Subsets

Learn Before

Checking a Mismatch Hypothesis on Training Dev Subsets

Case Study

Diagnosing Speech Recognition Performance in Car Audio Subsets

Case context: You are developing an in-car speech recognition system. Your training set contains a mix of clean audio and some car-recorded audio. Similarly, your training dev set also includes a small subset of car-recorded audio. The model's overall performance is low, and you suspect that a data mismatch regarding car audio is the primary issue.

Question: According to the principles of checking mismatch hypotheses on training dev subsets, what diagnostics should you perform on these subsets, and what result would confirm that obtaining additional car-recorded audio is the correct next step?

Sample answer: You should isolate and evaluate performance on the car-recorded audio subset in both the training set and the training dev set. If the system achieves high performance on the training set's car audio but performs poorly on the training dev set's car audio, this result validates the hypothesis that collecting more car-recorded audio is necessary and would help improve performance.

Key points:

Evaluate system performance specifically on the car audio subset within the training set.
Evaluate system performance specifically on the car audio subset within the training dev set.
Confirm the need for more car data if performance is high on the training subset but low on the training dev subset.

Rubric: The response must specify evaluating the car audio subset in both the training and training dev sets, and state that high training performance combined with poor training dev performance on this subset validates collecting more car data.

Updated 2026-06-17

Contributors are:

Who are from:

References

Learn Before

Related