Summary
In this chapter we learned how to manage the Kinect sensor audio stream data and enhance the Kinect sensor's capabilities for speech recognition.
We have been working mainly using the KinectAudioSource
class. This class manages the stream of either raw or modified audio from the microphone array. The audio stream can be modified to include a variety of algorithms to improve its quality, including noise suppression, automatic gain control, and acoustic echo cancellation.
First of all we introduced the concept of grammars for converting sounds in commands. Grammars are defined by XML files or programmatically. For increasing the quality of the speech recognition process, many times applications use specific prefixes to improve accuracy. While implementing a grammar, it is a good practice to define a speech command as a combination of application-specific keywords plus the actual command. This decreases the chance of treating random words as an actual speech command.
While working with...