Text to speech with Amazon Polly
Amazon Polly is all about converting text into speech, and it does so by using pretrained deep learning models. It is a fully managed service, so you do not have to do anything. You provide the plain text as input for synthesizing or in Speech Synthesis Markup Language (SSML) format so that an audio stream is returned. It also gives you different languages and voices to choose from, with both male and female options. The output audio from Amazon Polly can be saved in MP3 format for further use in the application (web or mobile) or can be a JSON output for written speech.
For example, if you were to input the text “Baba went to the library” into Amazon Polly, the output speech mark object would look as follows:
{"
time":370,"type":"word","start":5,"end":9,"value":"went"}
The word "went"
begins 370 milliseconds after the audio stream begins, and starts...