The Kinect SDK architecture for Audio
The SDK installs the Kinect USB Audio components that actually interact with the microphone array of the Kinect sensor and the SDK components. For speech recognition, Kinect uses the underlying speech API of the Windows operating system. Kinect has its own internal pipeline that processes the captured audio data; however, when it comes under the operating system level, the audio API is built on existing audio framework components. From the following diagram, you can see that the captured audio from the Kinect microphone array is passed to the application via the Kinect and Windows Audio Components:
Along with the device drivers, the following are the two major components:
DirectX Media Object (DMO)
Windows Speech Recognition API (SAPI)
The majority of audio functionality, such as as Noise Suppression (NS), Acoustic Echo Cancellation (AEC), and Automatic Gain Control (AGC) is controlled by the DMO. However, these are not new functionalities for DMO; the...