Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Kinect in Motion - Audio and Visual Tracking by Example

You're reading from   Kinect in Motion - Audio and Visual Tracking by Example Start building for the Kinect today by capturing gestures, movements, and spoken voice commands

Arrow left icon
Product type Paperback
Published in Apr 2013
Publisher Packt
ISBN-13 9781849697187
Length 112 pages
Edition 1st Edition
Languages
Arrow right icon
Toc

Table of Contents (12) Chapters Close

Kinect in Motion – Audio and Visual Tracking by Example
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
1. Kinect for Windows – Hardware and SDK Overview FREE CHAPTER 2. Starting with Image Streams 3. Skeletal Tracking 4. Speech Recognition Kinect Studio and Audio Recording Index

Audio stream data – recording and injecting


As stated previously, the Kinect Studio currently delivered by Microsoft does not support the tracking and injecting of the audio stream data.

In this appendix, we have attached a simple and primitive tool for recording the speech input and to submit it against the speech recognition engine and the grammar defined.

We encourage you to take the idea further and to realize a more complex and user-friendly Kinect Audio/Studio type of application.

The idea behind the tool is very simple. You can record your audio input as a .wav file and then inject it in to the speech recognition engine and debug/test the audio stream processing.

You may want to use a different .wav file and see how the speech engine recognition works against other people pronunciation or other environmental characteristics that differ from the one where you are currently testing your application. Have you ever thought of developing an application that is capturing commands from a song? Or what about building a chaos monkey (a small tool able to test the reliability of your application) type of test injecting a no-sense .wav file in to your application? How is the application reacting to that?

As you may remember, we enabled the speech recognition process in to Chapter 4, Speech Recognition, calling the key SetInputToAudioStream API of the SpeechRecognitionEngine class for processing the AudioSource streamed out from the KinectSensor (please refer to the following code snippet). This enabled our application to try recognizing all the speech inputs streamed in by the Kinect sensor:

speechEngine.SetInputToAudioStream(
    sensor.AudioSource.Start(), 
    new SpeechAudioFormatInfo
    (EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
         speechEngine.RecognizeAsync(RecognizeMode.Multiple);

The SpeechRecognitionEngine class provides the SetInputToWaveFile method too, which enables us to receive input from a .wav file. So we can load the .wav file we recorded in advance with the following code:

speechEngine.SetInputToWaveFile(“COMMAND_TO_TEST.WAV”);

The speech recognition process will be the very same one we saw in the previous chapter. In order to save the audio captured by the Kinect sensors we can utilize the Recorder class to save the audio stream inside a .wav file format:

    sealed class Recorder
    {   static byte[] buffer = new byte[4096];
        static bool isRecording;
        public static bool IsRecording
        {   get { return isRecording; }
            set { isRecording = value; }
        }

The data format of a wave audio stream is defined by the WAVEFORMATEX structure:

        struct WAVEFORMATEX
        {   public ushort   wFormatTag;
            public ushort   nChannels;
            public uint     nSamplesPerSec;
            public uint     nAvgBytesPerSec;
            public ushort   nBlockAlign;
            public ushort   wBitsPerSample;
            public ushort   cbSize;
        }

Note

More details on a structure’s members are explained in the Microsoft references at http://msdn.microsoft.com/en-us/library/windows/hardware/ff538799(v=vs.85).aspx.

A complete list of WAVE_FORMAT_XXX formats (WAVE_FORMAT_PCM for one or two channel PCM data) can be found in the Mmreg.h header file.

With the WriteWavHeader method we create the header of the .wav file:

// Support method utilized by WriteWavHeader method
        static void WriteString(Stream stream, string s)
        {   byte[] bytes = Encoding.ASCII.GetBytes(s);
            stream.Write(bytes, 0, bytes.Length);
        }

        public static void WriteWavHeader(Stream stream, int dataLength)
        {   using (MemoryStream memStream = new MemoryStream(64))
            {   int cbFormat = 18;
                WAVEFORMATEX format = new WAVEFORMATEX()
                {   wFormatTag = 1,
                    nChannels = 1,
                    nSamplesPerSec = 16000,
                    nAvgBytesPerSec = 32000,
                    nBlockAlign = 2,
                    wBitsPerSample = 16,
                    cbSize = 0
                };
 
                using (var bw = new BinaryWriter(memStream))
                {   WriteString(memStream, “RIFF”);
                    bw.Write(dataLength + cbFormat + 4);
                    WriteString(memStream, “WAVE”);
                    WriteString(memStream, “fmt “);
                    bw.Write(cbFormat);
                    bw.Write(format.wFormatTag);
                    bw.Write(format.nChannels);
                    bw.Write(format.nSamplesPerSec);
                    bw.Write(format.nAvgBytesPerSec);
                    bw.Write(format.nBlockAlign);
                    bw.Write(format.wBitsPerSample);
                    bw.Write(format.cbSize);
                    WriteString(memStream, “data”);
                    bw.Write(dataLength);
                    memStream.WriteTo(stream);
                }
             }}

The WriteWaveFile method converts the Kinect Audio source in the .wav file:

public static void WriteWavFile(KinectAudioSource sourceAudio, 
FileStream fileStream)
        {   var size = 0;
            //Write header
            WriteWavHeader(fileStream, size);
 
            using (var audioStream = sourceAudio.Start())
            {  while (audioStream.Read(buffer, 0, buffer.Length) > 0&& isRecording)
                {   fileStream.Write(buffer, 0, buffer.Length);
                    size += buffer.Length;
                }
                long prePosition = fileStream.Position;
                fileStream.Seek(0, SeekOrigin.Begin);
                WriteWavHeader(fileStream, size);
                fileStream.Seek(0, SeekOrigin.Begin);
                WriteWavHeader(fileStream, size);
                fileStream.Seek(prePosition, SeekOrigin.Begin);
                fileStream.Flush();
            }}
   }}

We recall the Recorder class inside our application simply by invoking the RecordAudio method:

private static object lockObject = new object();
private void RecordAudio()
        {
            lock (lockObject)
            {Recorder.IsRecording = true;
             using (var fileStream = new 
                   FileStream(“COMMAND.WAV”, FileMode.Create))
             {
              Recorder.WriteWavFile(this.sensor.AudioSource, fileStream);
             }
            }
         }

To make our WPF application responsive to the user input and able to record the audio data streamed in by the Kinect sensor, we need to use background workers. The following code snippet highlights how to define the background worker and to invoke the RecordAudio method as the activity to implement when the background worker executes its work. The complete code source is provided in the code attached to this appendix:

private BackgroundWorker bgW =
new System.ComponentModel.BackgroundWorker();
…
this.bgW.RunWorkerCompleted += backgroundWorker1_RunWorkerCompleted;
this. bgW.DoWork += backgroundWorker1_DoWork;
…
void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{      RecordAudio();  }
…
Recorder.IsRecording = true;
if (!this.backgroundWorker1.IsBusy)
    {
this.backgroundWorker1.RunWorkerAsync();
    }
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image