You're reading from Learning Microsoft Cognitive Services Create intelligent apps with vision, speech, language, and knowledge capabilities using Microsoft Cognitive Services

Product type Paperback

Published in Mar 2017

Publisher Packt

ISBN-13 9781786467843

Length 372 pages

Edition 1st Edition

Languages

Tools

Microsoft Cognitive Services

Concepts

Artificial Intelligence

Author (1):

Leif Larsen Henning Larsen

View More author details

Table of Contents (14) Chapters

Preface

1. Getting Started with Microsoft Cognitive Services FREE CHAPTER

2. Analyzing Images to Recognize a Face

3. Analyzing Videos

4. Letting Applications Understand Commands

5. Speak with Your Application

6. Understanding Text

7. Extending Knowledge Based on Context

8. Querying Structured Data in a Natural Way

9. Adding Specialized Searches

10. Connecting the Pieces

Appendix A. LUIS Entities and Intents

LUIS pre-built entities

1. Appendix B. Additional Information on Linguistic Analysis

Phrase types

2. Appendix C. License Information

Overview of what we are dealing with

Now that you have seen a basic example of how to detect faces, it is time to learn a bit about what else Cognitive Services can do for you. When using Cognitive Services, you have 21 different APIs to hand. These are in turn separated into five top-level domains according to what they do. They are vision, speech, language, knowledge, and search. Let's look at more about them in the following sections.

Vision

APIs under the Vision flags allows your apps to understand images and video content. It allows you to retrieve information about faces, feelings, and other visual content. You can stabilize videos and recognize celebrities. You can read text in images and generate thumbnails from videos and images.

There are four APIs contained in the Vision area, which we will look at now.

Computer Vision

Using the Computer Vision API, you can retrieve actionable information from images. This means you can identify content (such as image format, image size, colors, faces, and more). You can detect whether or not an image is adult/racy. This API can recognize text in images and extract it to machine-readable words. It can detect celebrities from a variety of areas. Lastly it can generate storage-efficient thumbnails with smart cropping functionality.

We will look into Computer Vision in Chapter 2, Analyzing Images to Recognize a Face.

Emotion

The Emotion API allows you to recognize emotions, both in images and videos. This can allow for more personalized experiences in applications. Emotions detected are cross-cultural emotions: anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise.

We will cover Emotion API over two chapters, Chapter 2, Analyzing Images to Recognize a Face, for image-based emotions, and Chapter 3, Analyzing Videos, for video-based emotions.

Face

We have already seen the very basic example of what the Face API can do. The rest of the API revolves around the same, to detect, identify, organize, and tag faces in photos. Apart from face detection, you can see how likely it is that two faces belong to the same person. You can identify faces and also find similar-looking faces.

We will dive further into Face API in Chapter 2, Analyzing Images to Recognize a Face.

Video

The Video API is about the analyzing, editing, and processing of videos in your app. If you have a video that is shaky, the API allows you to stabilize it. You can detect and track faces in videos. If a video contains a stationary background, you can detect motion. The API lets you generate thumbnail summaries for videos, which allows users to see previews or snapshots quickly.

Video will be covered throughout Chapter 3, Analyzing Videos.

Speech

Adding one of the Speech APIs allows your application to hear and speak to your users. The APIs can filter noise and identify speakers. They can drive further actions in your application, based on the recognized intent.

Speech contains three APIs that are discussed as follows.

Bing Speech

Adding the Bing Speech API to your application allows you to convert speech to text and vice versa. You can convert spoken audio to text, either by utilizing a microphone or other sources in real-time, or by converting audio from files. The API also offers speech intent recognition, which is trained by Language Understanding Intelligent Service (LUIS) to understand the intent.

Speaker Recognition

The Speaker Recognition API gives your application the ability to know who is talking. By using this API, you can verify that the person speaking is who they claim to be. You can also determine who an unknown speaker is, based on a group of selected speakers.

Custom Recognition

To improve speech recognition, you can use the Custom Recognition API. This allows you to fine-tune speech recognition operations for anyone, anywhere. By using this API, the speech recognition model can be tailored to the vocabulary and speaking style of the user. In addition, the model can be customized to match the expected environment of the application.

We will cover all Speech related APIs in Chapter 5, Speak with Your Application.

Language

APIs related to language allows your application to process natural language and learn how to recognize what users want. You can add textual and linguistic analysis to your application, as well as natural language understanding.

The following five APIs can be found in the Language area.

Bing Spell Check

The Bing Spell Check API allows you to add advanced spell checking to your application.

This API will be covered in Chapter 6, Understanding Text.

Language Understanding Intelligent Service (LUIS)

LUIS is an API that can help your application understand commands from your users. Using this API, you can create language models that understand intents. By using models from Bing and Cortana, you can make these models recognize common requests and entities (such as places, times, and numbers). You can add conversational intelligence to your applications.

LUIS will be covered in Chapter 4, Let Applications Understand Commands.

Linguistic Analysis

The Linguistic Analysis API lets you parse complex text to explore the structure of text. By using this API you can find nouns, verbs, and more in text, which allows your application to understand who is doing what to whom.

We will see more of Linguistic Analysis in Chapter 6, Understanding Text.

Text Analysis

The Text Analysis API will help you in extracting information from text. You can find the sentiment of a text (whether the text is positive or negative). You will be able to detect language, topic, and key phrases used throughout the text.

We will also cover Text Analysis in Chapter 6, Understanding Text.

Web Language Model

By using the Web Language Model (WebLM) API you are able to leverage the power of language models trained on web-scale data. You can use this API to predict which words or sequences follow a given sequence or word.

Web Language Model API will be covered in Chapter 6, Understanding Text.

Knowledge

When talking about Knowledge APIs, we are talking about APIs that allow you to tap into rich knowledge. This may be knowledge from the web. It may be from academia or it may be your own data. Using these APIs, you will be able to explore different nuances of knowledge.

The following four APIs are contained in the Knowledge API area.

Academic

Using the Academic API, you can explore relationships among academic papers, journals, and authors. This API allows you to interpret natural language user query strings, which allow your application to anticipate what the user is typing. It will evaluate said expression and return academic knowledge entities.

This API will be covered more in Chapter 8, Query Structured Data in a Natural Way.

Entity Linking

Entity Linking is the API you would use to extend knowledge of people, places, and events based on the context. As you may know, a single word may be used differently based on the context. Using this API allows you to recognize and identify each separate entity within a paragraph, based on the context.

We will go through Entity Linking API in Chapter 7, Extending Knowledge Based on Context.

Knowledge Exploration

The Knowledge Exploration API will let you add the possibility to use interactive search for structured data in your projects. It interprets natural language queries and offers auto-completions to minimize user-effort. Based on the query expression received, it will retrieve detailed information about matching objects.

Details on this API will be covered in Chapter 8, Query Structured Data in a Natural Way.

Recommendations

The Recommendations API allows you to provide personalized product recommendations for your customers. You can use this API to add frequently bought together functionality to your application. Another feature you can add is item-to-item recommendations, which allow customers to see what other customers like. This API will also allow you to add recommendations based on the prior activity of the customer.

We will go through this API in Chapter 7, Extending Knowledge Based on Context.

Search

Search APIs give you the ability to make your applications more intelligent with the power of Bing. Using these APIs, you can use a single call to access data from billions of web pages, images, videos, and news.

The following five APIs are in the search domain.

Bing Web Search

With Bing Web Search you can search for details in billions of web documents indexed by Bing. All the results can be arranged and ordered according to a layout you specify, and the results are customized to the location of the end user.

Bing Image Search

Using the Bing Image Search API, you can add an advanced image and metadata search to your application. Results include URL to images, thumbnails, and metadata. You will also be able to get machine-generated captions, similar images, and more. This API allows you to filter the results based on image type, layout, freshness (how new the image is), and license.

Bing Video Search

Bing Video Search will allow you to search for videos and return rich results. The results contain metadata from the videos, static or motion based thumbnails, and the video itself. You can add filters to the result, based on freshness, video length, resolution, and price.

Bing News Search

If you add Bing News Search to your application, you can search for news articles. Results can include authoritative image, related news and categories, information on the provider, URL, and more. To be more specific you can filter news based on topics.

Bing Autosuggest

The Bing Autosuggest API is a small, but powerful one. It will allow your users to search faster with search suggestions, allowing you to connect a powerful search to your apps.

All Search APIs will be covered in Chapter 9, Adding Specialized Search.