Learn Before
Fill in the Blank

An end-to-end speech recognition system takes a(n) _____ as input and directly outputs the transcript.