Human pose estimation is the process of estimating the configuration of the body (pose) from an image or video. It includes landmarks (points), which are similar to joints such as the feet, ankles, chin, shoulder, elbows, hands, head, and so on. We will be doing this automatically using deep learning. If you consider a face, the landmarks are relatively rigid or, rather, relatively constant from face to face, such as the relative position of the eyes to the nose, the mouth to the chin, and so forth.
The following photo provides an example:
Although the body structure remains the same, our bodies aren't rigid. So, we need to detect the different parts of our body relative to the other parts. For example, detecting the feet relative to the knee is very challenging compared to facial detection. Also, we can move our hands and feet...