Deep learning is a subfield of artificial intelligence focused on training neural networks to perform complex tasks, while computer vision deals with extracting information from visual data. We can use such fields in numerous detection tasks, such as pose detection. In this answer, we'll be implementing this concept.
Pose detection, also known as human pose estimation, involves identifying the key points in a human body, like hands, legs, joints, and other body parts, from an image or video.
One further specific application of pose detection is gesture recognition, where we can specifically analyze hand poses to identify specific gestures. For instance, recognizing a thumbs-up gesture can be used in interactions with devices or even controlling virtual elements.
Note: Learn how to implement gesture recognition here.
MediaPipe is an open-source framework that offers a collection of pre-trained deep-learning models, including pose detection. The main advantage is that such a model can be easily integrated into our custom computer vision applications.
To demonstrate pose detection, we will use MediaPipe's pose_landmarker.task model. This pre-trained model allows us to recognize the various landmarks in one's pose.
Note: You can download this model here.
Pose landmarks are key points on the human body that define its pose and positioning. These landmarks represent specific body parts, like hands, legs, and joints, and are essential for accurately interpreting and recognizing human actions and gestures.
To keep things simple, we'll start off by understanding how to apply this model to single images.
We will use Python and OpenCV for image processing and visualization, along with MediaPipe for pose detection.
import cv2import mediapipe as mp
First and foremost, we import the necessary libraries for our code.
img_file = "pose1.png"img = cv2.imread(img_file)
Next, we define the image file we want to process using img_file
and read it using the cv2.imread
method, saving it as img
.
mp_pose = mp.solutions.pose.Pose(min_detection_confidence=0.5,min_tracking_confidence=0.5)
This is a crucial step, where we define an instance of the pose detection solution mp_pose
with specified confidence levels for detection and tracking. This means that in order to consider the pose landmark, the model should at least have this level of certainty.
rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)results = mp_pose.process(rgb_img)
To process the image with the pose detection model, we convert it to the RGB format using cv2.cvtColor
. The results of the detection are stored in results
.
copied_image = img.copy()if results.pose_landmarks:mp.solutions.drawing_utils.draw_landmarks(copied_image,results.pose_landmarks,mp.solutions.pose.POSE_CONNECTIONS,mp.solutions.drawing_styles.get_default_pose_landmarks_style())
We then create a copy of the original image, copied_image
, to draw the pose landmarks on it. If pose landmarks are detected in the image i.e. the if
statement, we visualize the landmarks on the copied image using the mp.solutions.drawing_utils.draw_landmarks
function. On the other hand, the mp.solutions.mp_drawing_styles
function is used for styling the landmarks.
cv2.imshow("Detecting poses", copied_image)cv2.waitKey(0)cv2.destroyAllWindows()
After visualizing the pose landmarks on the image, we display the image using cv2.imshow
. The displayed image will show the pose landmarks and will remain open until a key is pressed.
If no pose landmarks are detected in the image, we print the message "No hands were detected!"
import cv2 import mediapipe as mp img_file = "pose1.png" img = cv2.imread(img_file) mp_pose = mp.solutions.pose.Pose( min_detection_confidence=0.5, min_tracking_confidence=0.5) rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) results = mp_pose.process(rgb_img) copied_image = img.copy() if results.pose_landmarks: mp.solutions.drawing_utils.draw_landmarks( copied_image, results.pose_landmarks, mp.solutions.pose.POSE_CONNECTIONS, mp.solutions.drawing_styles.get_default_pose_landmarks_style() ) cv2.imshow("Detecting poses", copied_image) cv2.waitKey(0) cv2.destroyAllWindows()
We can also detect the changing poses in videos! This can be achieved by considering each frame of the video as a single image and the landmarks being applied to each image. When this process is carried out continuously, we get a video with pose landmarks changing in every frame. Let's see the code in action.
import cv2 import mediapipe as mp mp_pose = mp.solutions.pose.Pose( min_detection_confidence=0.5, min_tracking_confidence=0.5) cap = cv2.VideoCapture("https://player.vimeo.com/external/206207511.sd.mp4?s=797bab17ff9fce2a8973fd5c6c161d8d80f76f7b&profile_id=164&oauth2_token_id=57447761") while cap.isOpened(): ret, frame = cap.read() if not ret: break rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) results = mp_pose.process(rgb_frame) if results.pose_landmarks: mp.solutions.drawing_utils.draw_landmarks( frame, results.pose_landmarks, mp.solutions.pose.POSE_CONNECTIONS, mp.solutions.drawing_styles.get_default_pose_landmarks_style() ) cv2.imshow("Detecting poses in videos", frame) if cv2.waitKey(1) == 27: break cap.release() cv2.destroyAllWindows()
In the code above, we go through each frame of the video by defining a while
loop that runs until the video is streaming. We read each frame
and then apply the same concept discussed for pose detection in images. The streaming ends once the video is over or the "Esc" key i.e. cv2.waitKey(1) == 27
is pressed.
Now, let's take a look at how poses are continuously detected in each video frame below.
Note: You can change the link of the video to any video of your choice or even use local video clips.
Passing 0 to the cv2.VideoCapture
function means redirecting the video stream to the web camera of your local machine. In this way, you can use the code to detect the shifts in your poses in real time!
import cv2import mediapipe as mpmp_pose = mp.solutions.pose.Pose(min_detection_confidence=0.5,min_tracking_confidence=0.5)cap = cv2.VideoCapture(0)while cap.isOpened():ret, frame = cap.read()if not ret:breakrgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)results = mp_pose.process(rgb_frame)if results.pose_landmarks:mp.solutions.drawing_utils.draw_landmarks(frame,results.pose_landmarks,mp.solutions.pose.POSE_CONNECTIONS,mp.solutions.drawing_styles.get_default_pose_landmarks_style())cv2.imshow("Detecting poses using the webcam", frame)if cv2.waitKey(1) == 27:breakcap.release()cv2.destroyAllWindows()
Note: You can run the above code on your local machine in order to connect to your web camera (if any).
Let's explore a few interesting applications where we can see the pose detection technology actually being made useful.
Applications | Explanation |
Human-computer interaction | Enables natural interaction with computers using gestures. |
Sports analysis | Analyzes athletes' movements to improve performance. |
Augmented reality | Integrates virtual content based on user body poses. |
Fitness and health | Monitors and analyzes body postures during exercise. |
Action recognition | Recognizes human actions from videos. |
Note: Here's the complete list of related projects in MediaPipe or deep learning.
Test your knowledge of pose detection!
Which MediaPipe function do we use to draw the pose detection landmarks on the image?
Free Resources