1Cademy - Traditional vs. End-to-End Speech Recognition Pipelines

Learn Before

End-to-End Deep Learning Example Use-cases
End-to-End Speech Recognition

Example

Traditional vs. End-to-End Speech Recognition Pipelines

The traditional approach to speech recognition relies on a pipeline with several intermediate components:

$\text{Audio (input)} \rightarrow \text{feature extraction} \rightarrow \text{phoneme detection} \rightarrow \text{word composition} \rightarrow \text{text transcript (output)}$

The end-to-end approach replaces this multi-step chain with a single deep neural network, allowing the system to be optimized directly for the final output using a single criterion:

$\text{Audio (input)} \rightarrow \text{Deep Neural Network} \rightarrow \text{text transcript (output)}$