Action recognition is a key part of computer vision and involves recognizing human hand, leg, head, and body positions to detect specific movements and classify them into well-known categories. The difficulty comes in there being variations in visual inputs (such as the body being cluttered or covered with clothing), similar actions but different categories such as drinking water or talking using a handheld cell phone, and getting representative training data.
This chapter provides a detailed overview of the key methods we can use for human pose estimation, as well as action recognition. Action recognition combines the pose estimation method with acceleration-based activity recognition, as well as video and three-dimensional point cloud-based action recognition. The theory will be supplemented by an explanation of its implementation...