Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Kinect in Motion - Audio and Visual Tracking by Example
Kinect in Motion - Audio and Visual Tracking by Example

Kinect in Motion - Audio and Visual Tracking by Example: Start building for the Kinect today by capturing gestures, movements, and spoken voice commands

eBook
NZ$14.99 NZ$38.99
Paperback
NZ$48.99
Subscription
Free Trial

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Kinect in Motion - Audio and Visual Tracking by Example

Chapter 1. Kinect for Windows – Hardware and SDK Overview

In this chapter we will define the key notions and tips for the following topics:

  • Critical hardware components of the Kinect for Windows device and their functionalities, properties, and limits

  • Software architecture defining the Kinect SDK 1.6

Motion computing and Kinect


Before getting Kinect in motion, let's try to understand what motion computing (or motion control computing) is and how Kinect built its success in this area.

Motion control computing is the discipline that processes, digitalizes, and detects the position and/or velocity of people and objects in order to interact with software systems.

Motion control computing has been establishing itself as one of the most relevant techniques for designing and implementing a Natural User Interface (NUI).

NUIs are human-machine interfaces that enable the user to interact in a natural way with software systems. The goals of NUIs are to be natural and intuitive. NUIs are built on the following two main principles:

  • The NUI has to be imperceptible, thanks to its intuitive characteristics: (a sensor able to capture our gestures, a microphone able to capture our voice, and a touch screen able to capture our hands' movements). All these interfaces are imperceptible to us because their use is intuitive. The interface is not distracting us from the core functionalities of our software system.

  • The NUI is based on nature or natural elements. (the slide gesture, the touch, the body movements, the voice commands—all these actions are natural and not diverting from our normal behavior).

NUIs are becoming crucial for increasing and enhancing the user accessibility for software solution. Programming a NUI is very important nowadays and it will continue to evolve in the future.

Kinect embraces the NUIs principle and provides a powerful multimodal interface to the user. We can interact with complex software applications and/or video games simply by using our voice and our natural gestures. Kinect can detect our body position, velocity of our movements, and our voice commands. It can detect objects' position too.

Microsoft started to develop Kinect as a secret project in 2006 within the Xbox division as a competitive Wii killer. In 2008, Microsoft started Project Natal, named after the Microsoft General Manager of Incubation Alex Kipman's hometown in Brazil. The project's goal was to develop a device including depth recognition, motion tracking, facial recognition, and speech recognition based on the video recognition technology developed by PrimeSense.

Kinect for Xbox was launched in November 2010 and its launch was indeed a success: it was and it is still a break-through in the gaming world and it holds the Guinness World Record for being the "fastest selling consumer electronics device" ahead of the iPhone and the iPad.

In December 2010, PrimeSense (primesense.com) released a set of open source drivers and APIs for Kinect that enabled software developers to develop Windows applications using the Kinect sensor.

Finally, on June 17 2011 Microsoft launched the Kinect SDK beta, which is a set of libraries and APIs that enable us to design and develop software applications on Microsoft platforms using the Kinect sensor as a multimodal interface.

With the launch of the Kinect for Windows device and the Kinect SDK, motion control computing is now a discipline that we can shape in our garages, writing simple and powerful software applications ourselves.

This book is written for all of us who want to develop market-ready software applications using Kinect for Windows that can track audio and video and control motion based on NUI. In an area where Kinect established itself in such a short span of time, there is the need to consolidate all the technical resources and develop them in an appropriate way: this is our zero-to-hero Kinect in motion journey. This is what this book is about.

This book assumes that you have a basic knowledge of C# and that we all have a great passion to learn about programming for Kinect devices. This book can be enjoyed by anybody interested in knowing more about the device and learning how to track audio and video using the Kinect for Windows Software Development Kit (SDK) 1.6. We deeply believe this book will help you to master how to process video depth and audio stream and build market-ready applications that control motion. This book has deliberately been kept simple and concise, which will aid you to quickly grasp the core and critical concepts.

Before jumping on the core of audio and visual tracking with Kinect for Windows, let's take the space of this introduction chapter to understand what the hardware and software architectures Kinect for Windows and its SDK 1.6 use.

Hardware overview


The Kinect device is a horizontal bar composed of multiple sensors connected to a base with a motorized pivot.

The following image provides a schematic representation of all the main Kinect hardware components. Looking at the Kinect sensor from the front, from the outside it is possible to identify the Infrared (IR) Projector (1), the RGB camera (3), and the depth camera (2). An array of four microphones (6), the three-axis accelerometer (5), and the tilt motor (4) are arranged inside the plastic case.

Kinect case and components

The device is connected to a PC through a USB 2.0 cable. It needs an external power supply in order to work because USB ports don't provide enough power.

Now let's jump in to the main features of its components.

The IR projector

The IR projector is the device that Kinect uses for projecting the IR rays that are used for computing the depth data. The IR projector, which from the outside looks like a common camera, is a laser emitter that constantly projects a pattern of structured IR dots at a wavelength around of 830 nm (patent US20100118123, Prime Sense Ltd.). This light beam is invisible to human eyes (that typically respond to wavelengths from about 390 nm to 750 nm) except for a red bright dot in the center of emitter.

The pattern is composed by 3 x 3 subpatterns of 211 x 165 dots (for a total of 633 x 495 dots). In each subpattern, one spot is much brighter than all the others.

As the dotted light (spot) hits an object, the pattern becomes distorted, and this distortion is analyzed by the depth camera in order to estimate the distance between the sensor and the object itself.

Infrared pattern

Note

In the previous image, we tested the IR projector against the room's wall. In this case we have to notice that a view of the clear infrared pattern can be obtained only by using an external IR camera (the left-hand side of the previous image). Taking the same picture from the internal RGB camera, the pattern will look distorted even though in this case the beam is not hitting any object (the right-hand side of the previous picture).

Depth camera

The depth camera is a (traditional) monochrome CMOS (complementary metal-oxide-semiconductor) camera that is fitted with an IR-pass filter (which is blocking the visible light). The depth camera is the device that Kinect uses for capturing the depth data.

The depth camera is the sensor returning the 3D coordinates (x, y, z) of the scene as a stream. The sensor captures the structured light emitted by the IR projector and the light reflected from the objects inside the scene. All this data is converted in to a stream of frames. Every single frame is processed by the PrimeSense chip that produces an output stream of frames. The output resolution is upto 640 x 480 pixels. Each pixel, based on 11 bits, can represent 2048 levels of depth.

The following table lists the distance ranges:

Mode

Physical limits

Practical limits

Near

0.4 to 3 m (1.3 to 9.8 ft)

0.8 to 2.5 m (2.6 to 8.2 ft)

Normal

0.8 to 4 m (2.6 to 13.1 ft)

1.2 to 3.5 m (4 to 11.5 ft)

Note

The sensor doesn't work correctly within an environment affected by sunlight, a reflective surface, or an interference with light with a similar wavelength (830 nm circa).

The following figure is composed of two frames extracted from the depth image stream: the one on the left represents a scene without any interference. The one on the right is stressing how interference can reduce the quality of the scene. In this frame, we introduced an infrared source that is overlapping the Kinect's infrared pattern.

Depth images

The RGB camera

The RGB camera is similar to a common color webcam, but unlike a common webcam, the RGB camera hasn't got an IR-cut filter. Therefore in the RGB camera, the IR is reaching the CMOS. The camera allows a resolution upto 1280 x 960 pixels with 12 images per second speed. We can reach a frame rate of 30 images per second at a resolution of 640 x 480 with 8 bits per channel producing a Bayer filter output with a RGGBD pattern. This camera is also able to perform color flicker avoidance, color saturation operations, and automatic white balancing. This data is utilized to obtain the details of people and objects inside the scene.

The following monochromatic figure shows the infrared frame captured by the RGB camera:

IR frame from the RGB camera

Note

To obtain high quality IR images we need to use dim lighting and to obtain high quality color image we need to use external light sources. So it is important that we balance both of these factors to optimize the use of the Kinect sensors.

Tilt motor and three-axis accelerometer

The Kinect cameras have a horizontal field of view of 57.5 degrees and a vertical field of view of 43.5 degrees. It is possible to increase the interaction space by adjusting the vertical tilt of the sensor by +27 and -27 degrees. The tilt motor can shift the Kinect head's angle upwards or downwards.

The Kinect also contains a three-axis accelerometer configured for a 2g range (g is the acceleration value due to gravity) with a 1 to 3 degree accuracy. It is possible to know the orientation of the device with respect to gravity reading the accelerometer data.

The following figure shows how the field of view angle can be changed when the motor is tilted:

Field of view angle

Microphone array

The microphone array consists of four microphones that are located in a linear pattern in the bottom part of the device with a 24-bit Analog to Digital Converter (ADC). The captured audio is encoded using Pulse Code Modulation (PCM) with a sampling rate of 16 KHz and a 16-bit depth. The main advantages of this multi-microphones configuration is an enhanced Noise Suppression, an Acoustic Echo Cancellation (AEC), and the capability to determine the location and the direction of an audio source through a beam-forming technique.

Software architecture

In this paragraph we review the software architecture defining the SDK. The SDK is a composite set of software libraries and tools that can help us to use the Kinect-based natural input. The Kinect senses and reacts to real-world events such as audio and visual tracking. The Kinect and its software libraries interact with our application via the NUI libraries, as detailed in the following figure:

Interaction diagram

Here, we define the software architecture diagram where we encompass the structural elements and the interfaces by which the Kinect for Windows SDK 1.6 is composed, as well as the behavior as specified in collaboration with those elements:

Kinect for Windows SDK 1.6 software architecture diagram

The following list provides the details for the information shown in the preceding figure:

  • Kinect sensor: The hardware components as detailed in the previous paragraph, and the USB hub through which the Kinect sensor is connected to the computer.

  • Kinect drivers: The Windows drivers for the Kinect, which are installed as part of the SDK setup process. The Kinect drivers are accessible in the %Windows%\System32\DriverStore\FileRepository directory and they include the following files:

    • kinectaudio.inf_arch_uniqueGUID;

    • kinectaudioarray.inf_arch_uniqueGUID;

    • kinectcamera.inf_arch_uniqueGUID;

    • kinectdevice.inf_arch_uniqueGUID;

    • kinectsecurity.inf_arch_uniqueGUID

    These files expose the information of every single Kinect's capabilities. The Kinect drivers support the following files:

    • The Kinect microphone array as a kernel-mode audio device that you can access through the standard audio APIs in Windows

    • Audio and video streaming controls for streaming audio and video (color, depth, and skeleton)

    • Device enumeration functions that enable an application to use more than one Kinect

  • Audio and video components defined by NUI APIs for skeleton tracking, audio, and color and depth imaging. You can review the NUI APIs header files in the %ProgramFiles%\Microsoft SDKs\Kinect\v1.6 folder as follows:

    • NuiApi.h: This aggregates all the NUI API headers

    • NuiImageCamera.h: This defines the APIs for the NUI image and camera services

    • NuiSensor.h: This contains the definitions for the interfaces as the audiobeam, the audioarray, and the accelerator

    • NuiSkeleton.h: This defines the APIs for the NUI skeleton

  • DirectX Media Object (DMO) for microphone array beam-forming and audio source localization. The format of the data used in input and output by a stream in a DirectX DMO is defined by the Microsoft.Kinect.DMO_MEDIA_TYPE and the Microsoft.Kinect.DMO_OUTPUT_DATA_BUFFER structs. The default facade Microsoft.Kinect.DmoAudioWrapper creates a DMO object using a registered COM server, and calls native DirectX DMO layer directly.

  • Windows 7 standard APIs: The audio, speech, and media APIs in Windows 7, as described in the Windows 7 SDK and the Microsoft Speech SDK (Microsoft.Speech, System.Media, and so on). These APIs are also available to desktop applications in Windows 8.

Video stream

The stream of color image data is handled by the Microsoft.Kinect.ColorImageFrame. A single frame is then composed of color image data. This data is available in different resolutions and formats. You may use only one resolution and one format at a time.

The following table lists all the available resolutions and formats managed by the Microsoft.Kinect.ColorImageFormat struct:

Color image format

Resolution

FPS

Data

InfraredResoluzion640x480Fps30

640 x 480

30

Pixel format is gray16

RawBayerResoluzion1280x960Fps12

1280 x 960

12

Bayer data

RawBayerResoluzion640x480Fps30

640 x 480

30

Bayer data

RawYuvResoluzion640x480Fps15

640 x 480

15

Raw YUV

RgbResoluzion1280x960Fps12

1280 x 960

12

RGB (X8R8G8B8)

RgbResoluzion640x480Fps15

640 x 480

15

Raw YUV

Undefined

N/A

N/A

N/A

Note

When we use the InfraredResoluzion640x480Fps30 format in the byte array returned for each frame, two bytes make up one single pixel value. The bytes are in little-endian order, so for the first pixel, the first byte is the least significant byte (with the least significant 6 bits of this byte always set to zero), and the second byte is the most significant byte.

The X8R8G8B8 format is a 32-bit RGB pixel format, in which 8 bits are reserved for each color.

Raw YUV is a 16-bit pixel format. While using this format, we can notice the video data has a constant bit rate, because each frame is exactly the same size in bytes.

In case we need to increase the quality of the default conversion done by the SDK from Bayer to RGB, we can utilize the Bayer data provided by the Kinect and apply a customized conversion optimized for our central processing units (CPUs) or graphics processing units (GPUs).

Note

Due to the limited transfer rate of USB 2.0, in order to handle 30 FPS, the images captured by the sensor are compressed and converted in to RGB format. The conversion takes place before the image is processed by the Kinect runtime. This affects the quality of the images themselves.

In the SDK 1.6 we can customize the camera settings for optimizing and adapting the color camera for our environment (when we need to work in a low light or a brightly lit scenario, adapt contrast, and so on). To manage the code the Microsoft.Kinect.ColorCameraSettings class exposes all the settings we want to adjust and customize.

Note

In native code we have to use the Microsoft.Kinect.Interop.INuiColorCameraSettings interface instead.

In order to improve the external camera calibration we can use the IR stream to test the pattern observed from both the RGB and IR camera. This enables us to have a more accurate mapping of coordinates from one camera space to another.

Depth stream

The data provided by the depth stream is useful in motion control computing for tracking a person's motion as well as identifying background objects to ignore.

The depth stream is a stream of data where in each single frame the single pixel contains the distance (in millimeters) from the camera itself to the nearest object.

The depth data stream Microsoft.Kinect.DepthImageStream by the Microsoft.Kinect.DepthImageFrame exposes two distinct types of data:

  • Depth data calculated in millimeters (exposed by the Microsoft.Kinect.DepthImagePixel struct).

  • Player segmentation data. This data is exposed by the Microsoft.Kinect.DepthImagePixel.PlayerIndex property, identifying the unique player detected in the scene.

The following table defines the characteristics of the depth image frame:

Depth image format

Resolution

Frame rate

Resoluzion640x480Fps30

640 x 480

30 FPS

Resoluzion320x240Fps30

320 x 240

30 FPS

Resolution80x60Fps

80 x 60

30 FPS

Undefined

N/A

N/A

The Kinect runtime processes depth data to identify up to six human figures in a segmentation map. The segmentation map is a bitmap of Microsoft.Kinect.DepthImagePixel, where the PlayerIndex property identifies the closest person to the camera in the field-of-view. In order to obtain player segmentation data, we need to enable the skeletal stream tracking.

Microsoft.Kinect.DepthImagePixel has been introduced in the SDK 1.6 and defines what is called the "Extended Depth Data", or full depth information: each single pixel is represented by a 16-bit depth and a 16-bit player index.

Note

Note that the sensor is not capable of capturing infrared streams and color streams simultaneously. However, you can capture infrared and depth streams simultaneously.

Audio stream

Thanks to the microphone array, the Kinect provides an audio stream that we can control and manage in our application for audio tracking, voice recognition, high-quality audio capturing, and other interesting scenarios.

By default, Kinect tracks the loudest audio input. Having said that, we can certainly direct programmatically the microphone arrays (towards a given location, or following a tracked skeleton, and so on).

DirectX Media Object (DMO) is the building block used by Kinect for processing audio streams.

Note

In native scenario in addition to the DirectX Media Object (DMO), we can use the Windows Audio Session API (WASAPI) too.

In managed applications, the Microsoft.Kinect.KinectAudioSource class (exposed in the KinectSensor.AudioSource property) is the key software architecture component concerning the audio stream. Using the Microsoft.Kinect.INativeAudioWrapper class wraps the DirectX Media Object (DMO), which is a common Windows component for a single-channel microphone.

The KinectAudioSource class is not limited to wrap the DMO, but it introduces additional abilities such as:

  • The _MIC_ARRAY_MODE as an additional microphone mode to support the Kinect microphone array.

  • Beam-forming and source localization.

  • The _AEC_SYSTEM_MODE Acoustic Echo Cancellation (AEC). The SDK supports mono sound cancellation only.

Audio input range

Note

In order to increase the quality of the sound, audio inputs coming from the sensor get upto a 20 dB suppression. The array microphone allows an optional additional 6 dB of ambient noise removal for audio coming from behind the sensor.

The audio input has a range of +/– 50 degrees (as visualized in preceding figure) in front of the sensor. We can point the audio direction programmatically using a 10 degree increment range in order to focus our attention on a given user or to elude noise sources.

Skeleton

In addition to the data provided by the depth stream, we can use those provided by the skeleton tracking to enhance the motion control computing capabilities of our applications in regards to recognizing people and following their actions.

We define the skeleton as a set of positioned key points. A detailed skeleton contains 20 points in normal mode and 10 points in seated mode, as shown in the following figure. Every single point of the skeleton highlights a joint of the human body.

Thanks to the depth (IR) camera, Kinect can recognize up to six people in the field of view. Of these, up to two can be tracked in detail.

The stream of skeleton data is maintained by the Microsoft.Kinect.SkeletonStream class and the Microsoft.Kinect.SkeletonFrame class. The skeleton data is exposed for each single point in the 3D space by the Microsoft.Kinect.SkeletonPoint struct. In any single frame handled by the skeleton stream we can manage up to six skeletons using an array of the Microsoft.Kinect.Skeleton class.

Skeleton in normal and seated mode

Summary


In this chapter we introduced Kinect, looking at the key architectural aspects such as the hardware composition and the SDK 1.6 software components. We walked through the color sensor, IR depth sensors, IR emitter, microphone arrays, the tilt motor for changing the Kinect camera angles, and the three-axis accelerometer.

Kinect generates two video streams using the color camera data and the depth information using the depth sensor. Kinect can detect up to six users in its view field and produce a detailed skeleton for two of them. All these characteristics make Kinect an awesome tool for video tracking motion. The Kinect's audio tracking makes the device a remarkable interface for voice recognition. Combining video and audio, Kinect and its SDK 1.6 are an outstanding technology for NUI.

Kinect is not just technology, it is indeed a means of how we can elevate the way users interact with complex software applications and systems. It is a break-through on how we can include NUIs and multimodal interface.

Kinect discloses unlimited opportunities to developers and software architects to design and create modern applications for different industries and lines of business.

The following examples are not meant to be an exhaustive list, but just a starting point that can inspire your creativity and increase your appetite for this technology.

  • Healthcare: This improves the physical rehabilitation process by constantly capturing data of the motion and posture of patient. We can enhance this scenario by allowing doctors to check the patient data remotely streamed by the Kinect sensor.

  • Education/Professional development: This helps in creating safe and more engaging environments based on gamification where students, teachers, and professionals can exercise activities and knowledge. The level of engagement can be increased even further using augmented reality.

  • Retail: This engages customers across multiple channels using the Kinect's multimodal interface. Kinect can be used as a navigation system for virtual windows while shopping online and/or visiting infotainment kiosks.

  • Home automation: This is also known as domotics where, thanks to the Kinect audio and video tracking, we can interact with all the electrical devices installed at our home (lights, washing machine, and so on).

In the next chapter, we will start to develop with the Kinect SDK, utilizing the depth and RGB camera streams. The applied examples will enable our application to optimize the way we manage and tune the streams themselves.

Left arrow icon Right arrow icon

Key benefits

  • Step-by-step examples on how to master the essential features of Kinect technology
  • Fully-functioning code samples ready to expand and adjust to your need
  • Compact and handy reference on how to adopt a multimodal user interface in your application

Description

Kinect is a motion-sensing input device by Microsoft for the Xbox 360 video game console and Windows PCs. It provides capabilities to enhance human-machine interaction along with a zero-to-hero journey to engage the user in a multimodal interface dialog with your software solution. Kinect in Motion - Audio and Visual Tracking by Example guides you in developing more than five models you can use to capture gestures, movements, and voice spoken commands. The examples and the theory discussed provide you with the knowledge to let the user become a part of your application. Kinect in Motion - Audio and Visual Tracking by Example is a compact reference on how to master color, depth, skeleton, and audio data streams handled by Kinect for Windows.Starting with an introduction to Kinect and its characteristics, you will first be shown how to master the color data stream with no more than one page of lines of code. Learn how to manage the depth information and map them against the color ones. You will then learn how to define and manage gestures that enable the user to instruct the application simply by moving arms or any other type of natural action. Finally you will complete your journey through a multimodal interface, combining gestures with audio.The book will lead you through many detailed, real-world examples, and even guide you on how to test your application.

Who is this book for?

Kinect in Motion - Audio and Visual Tracking by Example is great for developers new to the Kinect for Windows SDK, and who are looking to get a good grounding in how to master video and audio tracking. It's assumed that you have some experience in C# and XAML already.

What you will learn

  • Tune the captured color data stream to adjust the output to the environmental condition
  • Detect simple actions, such as arm movement, to raise events in your application
  • Debug and test your application to increase the quality of the software delivered
  • Track users wherever they are seated or standing so that your application can interact with the users
  • Capture sounds to convert the vocal input into application commands
  • Adjust the Kinect angle programmatically to optimize the view angle according to the user position and the environment characteristics

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Apr 25, 2013
Length: 112 pages
Edition : 1st
Language : English
ISBN-13 : 9781849697187
Vendor :
Microsoft
Languages :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Apr 25, 2013
Length: 112 pages
Edition : 1st
Language : English
ISBN-13 : 9781849697187
Vendor :
Microsoft
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just NZ$7 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just NZ$7 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total NZ$ 177.97
Kinect in Motion - Audio and Visual Tracking by Example
NZ$48.99
Kinect for Windows SDK Programming Guide
NZ$71.99
Augmented Reality with Kinect
NZ$56.99
Total NZ$ 177.97 Stars icon
Banner background image

Table of Contents

4 Chapters
Kinect for Windows – Hardware and SDK Overview Chevron down icon Chevron up icon
Starting with Image Streams Chevron down icon Chevron up icon
Skeletal Tracking Chevron down icon Chevron up icon
Speech Recognition Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8
(6 Ratings)
5 star 83.3%
4 star 16.7%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




carla Jul 02, 2013
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book has a very straight-forward approach to kinect programming.By keeping it simple and concise you can easily understand and get into kinect!It's even got an hardware overview, which I found great in order to further understand.Even though it is a small book, you'll be amazed on how quickly you'll start programming with Kinect just by following the examples.If you have any programming basis and want to get started with Kinect, this book is a great tool.
Amazon Verified review Amazon
Dick Mandemaker Jul 22, 2013
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Although I'm not a developer pur sang, this book gave me great insight in how Kinect works and how you can use it to develop applications that use the video and audio tracking. Chapter 1 describes how Kinect works and what the current state of tracking enables. The next chapters give you enough clues to develop your own applications, and the notes and summary in each chapter makes reading and understanding easy. And finally in the appendix you will learn how to save time coding and testing on Kinect enabled applications by recording all the video data coming into an application from a Kinect sensor with Kinect Studio, injecting the recorded video in an application allowing us to test your code without getting out of your chair over and over again. All in all this book is worth reading!
Amazon Verified review Amazon
Matteo Jun 28, 2013
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a short and very easy to read book, but is very useful. Clemente Giorio and Massimo Fascinari have taken a very look leading approach to drawing you in to how to use the Kinect SDK and get used to NUI style development.After a fast introduction of hardware (chapter 1), book builds up very nicely on all the features enabled with Kinect like Cameras (chapter 2), Skeletons (chapter 3), Speech recognition (chapter 4) and useful tools (chapter 5).There are a lot of examples that help to understand all features and gradually go deeper into the understanding of the development with Kinect. This book is useful also for expert developer because introduces interesting tricks.Highly recommend this book for anyone interested in delving into Kinect.
Amazon Verified review Amazon
Lector Amante de los Libros Jul 24, 2013
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Giorio and Fascinari have done an excellent work on this book. A very helpful aid to get involved in the fascinating world of Kinect for Windows and motion control computing and the techniques for designing natural user interfaces (NUIs).Chapter 1, Kinect for Windows - Hardware and SDK Overview: An easy, yet thorough explanation of the Kinect sensor hardware architecture and its sensors. The reader of this chapter will finally understand the secrets behind the Kinect sensor and will understand how it tracks its users and their movements.Chapter 2, Starting with Image Streams: This chapter guides the reader through the implementation of code to capture data from the color stream, depth stream, and IR stream data. At the same time, the reader of this chapter becomes familiar with the different mechanisms to handle images.Chapter 3, Skeletal Tracking: Perhaps the most fun part of this book. The programmer will learn how to track motion and get control of the Kinect for Windows users' movement, by means of manipulating and controlling the skeletons in both the default mode and he seated mode (available in Kinect for Windows but not in Kinect for XBOX360).Chapter 4, Speech Recognition: This chapter helps the reader to understand how to manipulate the Kinect sensor audio stream data, how to work with grammars defined by XML files and programmatically and how to track audio coming from different directions.Definitely a must read for everyone interested in programming applications for Kinect for Windows. The only drawback is that this book focuses on version 1.6 of the Kinect for Windows SDK, and at the time of its publication, Microsoft has made available already version 1.7. Programmers should definitely refer to the newest SDK documentation in addition to reading this book in order to benefit from the new tools and features.
Amazon Verified review Amazon
Keith Harvey May 31, 2013
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I really love technical books that get to the point!I can't stand most of the tomes that spend 100+ pages explaining the history of computer science or other nonsense. Kinect in Motion is NOT one of those books. It gets to the point of teaching you about audio and visual tracking and gives you some great development workflow tips along the way. When the authors do cover background information, it's directly related to learning the task at hand. Very cool!I was personally interested in this book because I've been wanting to understand Kinect's audio tracking abilities. While there are samples and blog posts out there floating around, I just never really got the whole picture of how things work. I've always wanted to build an intelligent video conferencing system that tracks users as they speak, and now I have what I need to make it happen.The book really is good at explaining motion tracking and especially the nuances around folks that are standing and sitting. For my goal of creating an intelligent video conferencing system. Most people are sitting, so this was very helpful.Overall, the style of this book is clean, focused, and sometimes you get little bits of humor which I appreciate. If you want to learn and more importantly UNDERSTAND Kinect audio and visual tracking, I highly recommend this book!Cheers!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.