How skeleton tracking works
The Kinect sensor returns raw depth data from which we can easily identify the pixels that represent the players. Skeleton tracking is not just about tracking the joints by reading the player information; rather, it tracks the complete body movement. Real-time human pose recognition is difficult and challenging because of the different body poses (consider; a single body part can move in thousands of different directions and ways), sizes (sizes of humans vary), dresses (dresses could differ from user to user), heights (human height could be tall, short, medium), and so on.
To overcome such problems and to track different joints irrespective of body pose, Kinect uses a rendering pipeline where it matches the incoming data (raw depth data from sensor) with sample trained data. The human pose recognition algorithm used several base character models that varied with different heights, sizes, clothes, and several other factors. The machine learned data is collected...