1Cademy - Scaling a speech recognition system

Learn Before

Large Neural Networks Benefit from Huge Data

Case Study

Scaling a speech recognition system

Case context: A team has built a speech recognition system using a moderately sized neural network and a dataset of 10,000 hours of audio. The performance has plateaued. They have the budget to either significantly increase the size of their neural network or collect 100,000 more hours of audio, but they cannot do both immediately.

Question: Based on the principle that the best performance comes from having both a very large network and huge data, what should the team diagnose about their ultimate goal, even if they must take steps sequentially?

Sample answer: The team should diagnose that achieving the absolute 'best performance' will ultimately require them to secure both the very large neural network and the huge amount of data. Scaling only one dimension will have limits, so their long-term roadmap must account for expanding both the network capacity and the dataset volume.

Key points: