Learn Before
End-to-End Text-to-Speech
Text-to-speech can use text features as input and audio as output.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
End-to-End Image Captioning
End-to-End Text-to-Speech
End-to-End Question Answering
End-to-End Machine Translation as Rich Output Learning
End-to-End Speech Recognition as Rich Output Learning
Which of the following best describes the outputs that end-to-end deep learning can directly learn?
End-to-end deep learning is limited to predicting outputs that are single numbers.
To train an end-to-end system that produces rich outputs, you need the right labeled _____ pairs.
Match each output category to an example of a rich output in end-to-end deep learning.
Order the reasoning steps a practitioner follows when deciding whether end-to-end learning can produce a rich output.
Which condition does ML Yearning identify as the key prerequisite for end-to-end learning to produce rich outputs?
A sentence is an example of a rich output that end-to-end deep learning can learn to produce directly.
ML Yearning describes the ability to learn rich outputs end-to-end as 'an accelerating _____ in deep learning.'
Match each end-to-end deep learning application to the type of rich output it produces.
Order the steps for building an end-to-end deep learning system that produces a rich output such as a translated sentence.
Learn After
What does an end-to-end TTS system take as input according to Machine Learning Yearning?
In an end-to-end TTS system as described in Machine Learning Yearning, the direct output of the model is audio.
In end-to-end TTS, _____ are used as input to the model to directly produce audio.
Match each element of the end-to-end TTS pipeline to its role in the system.
Order the stages of an end-to-end TTS pipeline from input to final output as described in Machine Learning Yearning.
Why does Machine Learning Yearning classify end-to-end TTS under 'Directly Learning Rich Outputs'?
According to Machine Learning Yearning, the TTS pipeline flows from audio input to text feature output.
Machine Learning Yearning (p. 103) describes TTS as mapping text features to _____ as its rich output.
Match each term to its description in Machine Learning Yearning's treatment of end-to-end TTS.
Order the reasoning steps for classifying an end-to-end TTS system as a 'directly learning rich outputs' problem.