The relationship between conversational AI and NLP
Conversational artificial intelligence is the broad label for an ecosystem of cooperating technologies that enable systems to conduct spoken and text-based conversations with people. These technologies include speech recognition, NLP, dialog management, natural language generation, and text-to-speech generation. It is important to distinguish these technologies, since they are frequently confused. While this book will focus on NLP, we will briefly define the other related technologies so that we can see how they all fit together:
- Speech recognition: This is also referred to as speech-to-text or automatic speech recognition (ASR). Speech recognition is the technology that starts with spoken audio and converts it to text.
- NLP: This starts with written language and produces a structured representation that can be processed by a computer. The input written language can either be the result of speech recognition or text that was originally produced in written form. The structured format can be said to express a user’s intent or purpose.
- Dialog management: This starts with the structured output of NLP and determines how a system should react. System reactions can include such actions as providing information, playing media, or getting more information from a user in order to address their intent.
- Natural language generation: This is the process of creating textual information that expresses the dialog manager’s feedback to a user in response to their utterance.
- Text-to-speech: Based on the textural input created by the natural language generation process, the text-to-speech component generates spoken audio output when given text.
The relationships among these components are shown in the following diagram of a complete spoken dialog system. This book focuses on the NLP component. However, because many natural language applications use other components, such as speech recognition, text-to-speech, natural language generation, and dialog management, we will occasionally refer to them:
Figure 1.2 – A complete spoken dialog system
In the next two sections, we’ll summarize some important natural language applications. This will give you a taste of the potential of the technologies that will be covered in this book, and it will hopefully get you excited about the results that you can achieve with widely available tools.