Conversational user interfaces

Conversational user interfaces are as old as modern computers themselves. ENIAC, the first programmable general-purpose computer, was built in the year 1946. In 1950, Alan Turing, a British computer scientist, proposed to measure the level of intelligence in machines using a conversational test called the Turing test. The test involved having the machine compete with a human as a dialogue partner to a set of human judges (yet another human). The judges would interact with each of the two participants (the human and the machine) using a text type interface that is not unlike most of the modern messaging chat applications. Over chat, the judges were supposed to identify which of the two participants was the machine. If at least 30% of the judges couldn't differentiate between the two participants, the machine was considered to have passed the test. This was one of the earliest human thoughts on conversational interfaces and their bearing on the intelligence levels of machines that have such capabilities. However, attempts to build such interfaces have not been very successful for several following decades.

For about 35 years, since the 1980s, Graphical User Interfaces (GUI) have been dominating the way in which we have been interacting with machines. With recent developments in AI and growing constraints such as the shrinking size of gadgets (from laptops to mobile phones), reducing on-screen real estates (smart watches), and the need for interfaces to become invisible (smart home and robots), conversational user interfaces are once again becoming a reality. For instance, the best way to interact with mobile robots that are distributed gadgets in smart homes would be using voice. The system should, therefore, be able to understand the users' requests and responses in natural human language. Such capabilities of systems can reduce human effort in learning and understanding current complex interfaces.

Conversational user interfaces have been known under several names: natural language interfaces, spoken dialogue systems, chatbots, intelligent virtual agents, virtual assistants, and so on. The actual difference between these systems is in terms of the backend integrations (for example, databases, and task/control modules), modalities (for example, text, voice, and visual avatars), and channels they get deployed on. However, one of the common themes among these systems is their ability to interact with users in a conversational manner using natural language.

A brief history of chatbots

The origins of modern chatbots can be traced back to 1964 when Joseph Weizenbaum at Massachusetts Institute of Technology (MIT) developed a chatbot called Eliza. It used simple rules of conversation and rephrased most of what the users said to simulate a Rogerian therapist. While it showed that naive users may be fooled into thinking that they are talking to an actual therapist, the system itself did not understand the user's problem. Following this, in 1991, the Loebner prize was instituted to encourage AI researchers to build chatbots that can beat the Turing test and advance the state of AI. Although no chatbots beat the test until 2014, many notable chatbots won prizes for winning other constrained challenges. These include ALICE, JabberWacky, Rose, and Mitsuku. However, in 2014, in a Turing test competition to mark the 60th anniversary of Alan Turing's death, a chatbot called Eugene Goostman, portraying a 13 year old kid, managed to fool 33% of the judges—thereby beating the test. Artificial Intelligence Markup Language (AIML) and ChatScript were developed as a way to script the knowledge and conversational content for most of these chatbots. Scripts developed using these scripting languages can then be fed into interpreters to create conversational behavior. Chatbots developed to beat the Turing test were largely chatty with just one objective—to beat the Turing test. This was not considered by many as advancement in AI or toward building useful conversational assistants.

On the other hand, research in artificial intelligence, specifically in machine learning and natural language processing, gave rise to various conversational interfaces such as question answering systems, natural language interfaces to databases, and spoken dialogue systems. Unlike chatbots built to beat the Turing test, these systems had very clear objectives. Question answering systems processed natural language questions and found answers in unstructured text datasets. Natural Language Interfaces to Database Systems (NLIDBS) were interfaces to large SQL databases that interpreted database queries posed in a natural language such as English, converted them into SQL, and returned the hits as response. Spoken Dialogue Systems (SDS) were systems that could maintain contextful conversations with users to handle conversational tasks such as booking tickets, controlling other systems, and tutoring learners. These were the precursors of modern chatbots and conversational interfaces.

Recent developments

In 2011, Apple released an intelligent assistant called Siri as part of their iPhones. Siri was modeled to be the user's personal assistant, doing tasks such as making calls, reading messages, and setting alarms and reminders. This is one of the most significant events in the recent past that rebooted the story of conversational interfaces. During the initial days of Siri, users used it only a few times a month to perform tasks such as searching the internet, sending SMS, and making phone calls. Although novel, Siri was treated as a work in progress with a lot more features to be added in the following years. In the early days, Siri had many clones and competition on Android and other smartphone platforms. Most of these were modeled as assistants and were available as mobile apps.

In the same year (2011), IBM introduced Watson, a question answering system that participated in a game show called Jeopardy and won it against previous human winners, Brad Rutter and Ken Jennings. This marked a milestone in the history of AI as Watson was able to process open domain natural language questions and answer them in real time. Since then, Watson has been refashioned into a toolkit with an array of cognitive service tools for natural language understanding, sentiment analysis, dialogue management, and so on.

Following Siri and Watson, the next major announcement came from Microsoft in 2013, when they introduced Cortana as a standard feature on Windows phones and later in 2015 on Windows 10 OS. Like Siri, Cortana was a personal assistant that managed tasks such as setting reminders, answering questions, and so on.

In November 2014, Amazon invited its Prime members to try out its very own personal assistant called Alexa. Alexa was made available on Amazon's own product called Echo. Echo was a first-of-its-kind smart speaker that housed within it an assistant like a "ghost" in the machine. Although called a speaker, it was actually a tiny computer with the voice as its only interface, unlike smartphones, tablets, and personal computers. Users can speak to Alexa using voice, ask her to do tasks such as setting reminders, playing music, and so on.

Recently, in April 2016, Facebook announced that it is opening up its popular Messenger platform for chatbots. This was a radically different approach to conversational interfaces compared to Siri, Alexa, and Cortana. Unlike these personal assistants, Facebook's announcement led to the creation of custom built and branded chatbots. These bots are very much like Siri, Cortana, and Alexa, but can be custom tuned to the requirements of the business building them. Chatbots are now poised to disrupt several markets, including customer service, sales, marketing, technical support, and so on. Many messaging platforms, such as Skype, Telegram, and others, also opened up to chatbots around the same time.

In May 2016, Google announced Assistant, its version of a personal chatbot that was accessible on multiple platforms such as Allo app and Google Home (a smart speaker like Echo). All assistants like Siri, Cortana, Alexa, and Google Assistant have also opened up as channels for third-party conversational capabilities. So, it is now possible to make your Alexa and Google Assistant personalized by adding conversational capabilities (called skills or actions) from a library of third-party solutions. Just as brands can develop their own chatbots for various messaging services (for example, Skype and Facebook Messenger), they can also develop skills for Alexa or actions for Google Assistant. Apple's very own smart speaker, Homepod, powered by Siri, is slated to be released in 2018.

Parallel to these developments, there has also been major growth in terms of tools that are available to build and host chatbots. Over the last two years, there has been an exponential growth of tools to design, mock, build, deploy, manage, and monetize chatbots. This has resulted in the creation of an ecosystem that designs and builds custom conversational interfaces for businesses, charities, governmental, and other organizations across the globe.

Architecture of a conversational user interface

In this section, let's take a look at the basic architecture of a conversational interface:

The core module of a conversational interface is the conversation manager. This module controls the flow of the conversation. It takes the semantic representation of what the user says as input, and decides what the response of the system should be. It will maintain a representation of the conversational context in some form, say a set of key value pairs, in order to meaningfully carry out the conversation over several turns between the user and the system.

The semantic representation of the user input can be directly fed from button pushes. In systems that can understand language, user utterances will be translated into semantic representation, consisting of user intents and parameters (slots and entities), by a natural language understanding module. This module may need to be previously trained to understand a set of user intents identified by the developer pertaining to the conversational tasks at hand.

Voice-enabled interfaces that accept user's speech inputs also need a speech recognition module that can transcribe speech into text before feeding it into the natural language understanding module. Symmetrically, on the other side, there is a need for a speech synthesizer (or text-to-speech engine) module that converts the system's text response into speech.

The conversational manager will interact with backend modules. It can be a database or an online data source that gets queried in order to answer a user's question (for example, TV schedule) or an online service to carry out a user's instruction (for example, booking a ticket).

The channel is where the chatbot actually meets the user. Depending on the channel, there may be one or more modules that make up this layer. For instance, if the chatbot is on Facebook Messenger, this layer consists of a Facebook Page and a Facebook App that connects to the rest of the chatbot modules wrapped as a web app.

Classification

Conversational user interfaces have found themselves applied in various scenarios. Their applications can be classified broadly into two categories: enterprise assistants and personal assistants.

Enterprise Assistants are chatbots and other conversational user interfaces that are modeled after customer service representatives and store assistants. Just like human customer service representatives, the bots engage customers in conversation carrying out marketing, sales, and support tasks. Most chatbots deployed on channels such as Facebook Messenger, Skype, Slack, and many more are enterprise assistants. They are designed and built to do tasks that store assistants and customer service representatives would do. Enterprise assistants are being developed in many business sectors, automating a variety of conversational tasks.

On the other hand, personal assistants are bots like Alexa, Siri, and Cortana, which act as a user's personal assistant, doing tasks such as managing a calendar, sending texts, taking calls, and playing music. These personal assistants can be extended in terms of their capabilities. For instance, Alexa allows for such augmentation by letting developers build skills that users can choose to add to their own Alexa. Brands can, therefore, develop skills for Alexa or actions for Google Assistant that will enable Alexa and Assistant to interact with the brand's IT services and perform tasks such as placing orders, checking delivery status, and many more. For instance, popular brands like PizzaHut, Starbucks, and Domino's have developed skills that can be enabled on Alexa.

Applications

Although chatbots have been under development for at least a few decades, they did not become mainstream channels for customer engagement until recently. Over the past two years, due to serious efforts by industry giants like Apple, Google, Microsoft, Facebook, IBM, and Amazon, and their subsequent investments in developing toolkits, chatbots and conversational interfaces have become a serious contender to other customer contact channels. In this time, chatbots have been applied in various sectors and various conversational scenarios within these sectors: retail, banking and finance, governmental, health, legal and third sector, and many more.

In retail, chatbots have been applied for product marketing, brand engagement, product assistance, sales, and support conversations. Brand-engagement chatbots offer tips and advice to loyal customers of a brand related to the use of products sold by the brand. For instance, Sephora chatbot advises users on how to select their ideal lipstick. Similarly, TK-Maxx chatbot assisted users in choosing gifts for their friends and family during Christmas 2016. One of the first retailers to explore chatbots for sales was H&M. The H&M chatbot helped users browse through the product catalogue and add products to their shopping carts. Car manufacturers like Tesla, Kia, and Mercerdes have developed chatbots that can help car users with information regarding their cars.

Chatbots have been very successful in the banking and finance industry. Banking was one of the first sectors that experimented with conversational interfaces. Banking chatbots can answer generic questions about financial products, secure banking, and so on, along with providing specific and personalized information about user's accounts. Many global banks and financial service providers including Bank of America, ICICI bank, HSBC, Royal Bank of Scotland, Capital One, Mastercard, and so on have deployed chatbots to assist their customers. Many fintech companies are building chatbots that can act as financial assistant to users. Ernest.ai and Cleo are chatbots that can link to your bank accounts and talk to you about your spending, balances, and also provide tips to save money. Chatbots are also being widely deployed in the insurance sector, where they act as assistants that can get you tailored quotes (for example, SPIXII).

Chatbots are also being used in legal, health, governmental, and third sectors. A chatbot called DoNotPay has assisted people to challenge parking tickets in London and New York in over 160,000 cases. Following this, more chatbots have been developed to help people access justice and legal services: assessment of crime (LawBot), business incorporation (LawDroid), help tenants (RentersUnion), help with legal questions and documentation (Lisa, LegaliBot, Lexi, DocuBot), and find lawyers (BillyBot).

In the third sector, chatbots have been used to spread awareness of issues that charities care about. Stoptober is a Facebook chatbot that was developed by the National Health Services (NHS) in the UK to help smokers quit. Another chatbot, Yeshi, was developed to draw awareness to Ethiopia's water crisis. Chatbots are beginning to make their entry into healthcare as well. Chatbots like Your.MD and HealthTap were designed to diagnose health issues based on symptoms. Emily is a chatbot designed by LifeFolder to help make the end of life decisions (for example, legal documentation, life support, organ donation, and many more).

Chatbots are not only being used to be customer facing but also internally, to face employees. Chatbots, in a sense, are becoming coworkers by helping fellow employees with tasks that are repetitive, mundane, and boring. Messaging services such as Slack and Microsoft Teams have been encouraging chatbots on their platforms to automate office communication. These bots aim to engage coworkers in chat on fun and essential tasks. For instance, there are bots to help coworkers share knowledge (Obie.ai), access other services such as GDrive (WorkBot), set up meetings (Meekan), discuss lunch (LunchTrain), and even help with decision making (ConcludeBot, SimplePoll).

If you are interested in finding out more use cases, I would recommend you to take a look at some of the bot directory services like botlist.co and www.chatbots.org, where you can find more information and inspiration.