Example

Traditional vs. End-to-End Speech Recognition Pipelines

The traditional approach to speech recognition relies on a pipeline with several intermediate components:

Audio (input)feature extractionphoneme detectionword compositiontext transcript (output)\text{Audio (input)} \rightarrow \text{feature extraction} \rightarrow \text{phoneme detection} \rightarrow \text{word composition} \rightarrow \text{text transcript (output)}

The end-to-end approach replaces this multi-step chain with a single deep neural network, allowing the system to be optimized directly for the final output using a single criterion:

Audio (input)Deep Neural Networktext transcript (output)\text{Audio (input)} \rightarrow \text{Deep Neural Network} \rightarrow \text{text transcript (output)}

0

2

Updated 2026-06-13

Tags

Data Science

Related