Learn Before
Multiple Choice

What does an end-to-end speech recognition system directly output when given an audio clip as input?