The technology of text-to-speech synthesis
There are two main stages in text-to-speech synthesis:
Text analysis, where the text to be synthesized is analyzed and prepared for spoken output.
Wave form generation, where the analyzed text is converted into speech.
There can be many problems in the text analysis stage. For example, what is the correct pronunciation of the word staring? Is it to be based on the combination of the word star + ing or of stare + ing? Determining the answer to this question involves complex analysis of the structure of words; in this case, determining how the root form of a word such as stare is changed by the addition of a suffix such as ing.
There are also words that have alternative pronunciations depending on their use in a particular sentence. For example, live as a verb will rhyme with give, but as an adjective it rhymes with five. The part of speech also affects stress assignment within a word; for example, record as a noun is pronounced 'record (with the stress...