Audio stream data – recording and injecting
As stated previously, the Kinect Studio currently delivered by Microsoft does not support the tracking and injecting of the audio stream data.
In this appendix, we have attached a simple and primitive tool for recording the speech input and to submit it against the speech recognition engine and the grammar defined.
We encourage you to take the idea further and to realize a more complex and user-friendly Kinect Audio/Studio type of application.
The idea behind the tool is very simple. You can record your audio input as a .wav
file and then inject it in to the speech recognition engine and debug/test the audio stream processing.
You may want to use a different .wav
file and see how the speech engine recognition works against other people pronunciation or other environmental characteristics that differ from the one where you are currently testing your application. Have you ever thought of developing an application that is capturing commands from a song? Or what about building a chaos monkey (a small tool able to test the reliability of your application) type of test injecting a no-sense .wav
file in to your application? How is the application reacting to that?
As you may remember, we enabled the speech recognition process in to Chapter 4, Speech Recognition, calling the key SetInputToAudioStream
API of the SpeechRecognitionEngine
class for processing the AudioSource
streamed out from the KinectSensor
(please refer to the following code snippet). This enabled our application to try recognizing all the speech inputs streamed in by the Kinect sensor:
speechEngine.SetInputToAudioStream( sensor.AudioSource.Start(), new SpeechAudioFormatInfo (EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null)); speechEngine.RecognizeAsync(RecognizeMode.Multiple);
The SpeechRecognitionEngine
class provides the SetInputToWaveFile
method too, which enables us to receive input from a .wav
file. So we can load the .wav
file we recorded in advance with the following code:
speechEngine.SetInputToWaveFile(“COMMAND_TO_TEST.WAV”);
The speech recognition process will
be the very same one we saw in the previous chapter. In order to save the audio captured by the Kinect sensors we can utilize the Recorder
class to save the audio stream inside a .wav
file format:
sealed class Recorder { static byte[] buffer = new byte[4096]; static bool isRecording; public static bool IsRecording { get { return isRecording; } set { isRecording = value; } }
The data format of a wave audio stream is defined by the WAVEFORMATEX
structure:
struct WAVEFORMATEX { public ushort wFormatTag; public ushort nChannels; public uint nSamplesPerSec; public uint nAvgBytesPerSec; public ushort nBlockAlign; public ushort wBitsPerSample; public ushort cbSize; }
Note
More details on a structure’s members are explained in the Microsoft references at http://msdn.microsoft.com/en-us/library/windows/hardware/ff538799(v=vs.85).aspx.
A complete list of WAVE_FORMAT_XXX
formats (WAVE_FORMAT_PCM
for one or two channel PCM data) can be found in the Mmreg.h
header file.
With the WriteWavHeader
method we create the header of the .wav
file:
// Support method utilized by WriteWavHeader method static void WriteString(Stream stream, string s) { byte[] bytes = Encoding.ASCII.GetBytes(s); stream.Write(bytes, 0, bytes.Length); } public static void WriteWavHeader(Stream stream, int dataLength) { using (MemoryStream memStream = new MemoryStream(64)) { int cbFormat = 18; WAVEFORMATEX format = new WAVEFORMATEX() { wFormatTag = 1, nChannels = 1, nSamplesPerSec = 16000, nAvgBytesPerSec = 32000, nBlockAlign = 2, wBitsPerSample = 16, cbSize = 0 }; using (var bw = new BinaryWriter(memStream)) { WriteString(memStream, “RIFF”); bw.Write(dataLength + cbFormat + 4); WriteString(memStream, “WAVE”); WriteString(memStream, “fmt “); bw.Write(cbFormat); bw.Write(format.wFormatTag); bw.Write(format.nChannels); bw.Write(format.nSamplesPerSec); bw.Write(format.nAvgBytesPerSec); bw.Write(format.nBlockAlign); bw.Write(format.wBitsPerSample); bw.Write(format.cbSize); WriteString(memStream, “data”); bw.Write(dataLength); memStream.WriteTo(stream); } }}
The WriteWaveFile
method converts
the Kinect Audio source in the .wav
file:
public static void WriteWavFile(KinectAudioSource sourceAudio, FileStream fileStream) { var size = 0; //Write header WriteWavHeader(fileStream, size); using (var audioStream = sourceAudio.Start()) { while (audioStream.Read(buffer, 0, buffer.Length) > 0&& isRecording) { fileStream.Write(buffer, 0, buffer.Length); size += buffer.Length; } long prePosition = fileStream.Position; fileStream.Seek(0, SeekOrigin.Begin); WriteWavHeader(fileStream, size); fileStream.Seek(0, SeekOrigin.Begin); WriteWavHeader(fileStream, size); fileStream.Seek(prePosition, SeekOrigin.Begin); fileStream.Flush(); }} }}
We recall the Recorder
class inside our application simply by invoking the RecordAudio
method:
private static object lockObject = new object(); private void RecordAudio() { lock (lockObject) {Recorder.IsRecording = true; using (var fileStream = new FileStream(“COMMAND.WAV”, FileMode.Create)) { Recorder.WriteWavFile(this.sensor.AudioSource, fileStream); } } }
To make our WPF application responsive to the user input and able to record the audio data streamed in by the Kinect sensor, we need to use background workers. The following code snippet highlights how to define the background worker and to invoke the RecordAudio
method as the activity to
implement when the background worker executes its work. The complete code source is provided in the code attached to this appendix:
private BackgroundWorker bgW = new System.ComponentModel.BackgroundWorker(); … this.bgW.RunWorkerCompleted += backgroundWorker1_RunWorkerCompleted; this. bgW.DoWork += backgroundWorker1_DoWork; … void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e) { RecordAudio(); } … Recorder.IsRecording = true; if (!this.backgroundWorker1.IsBusy) { this.backgroundWorker1.RunWorkerAsync(); }