How to do hand gesture recognition in augmented reality?

Augmented reality (AR) is a technology that relays digital information in the real world. This information can be a 3D model, an image, a video, or even data. Such functionality enables users to interact with digital assets in the physical world. This interaction can be done using hand gestures. To implement this type of interactivity, we need to use computer vision models and ensure that we can incorporate these models alongside the AR functionality. Before discussing the libraries which can be used to implement this, let us look at the general steps needed.

Now we will look at the mindmap and discuss it in detail:

Choose a library: We can use several libraries to implement hand gesture recognition. A few popular ones include Vuforia, OpenCV, and Manomotion.
Choose a hand gesture recognition model: We can take two approaches. One is to use a pre-trained model, and the other is to train a model by ourselves and then use it. Let us discuss both these options:
- Pre-trained model: Many pre-trained models are available in libraries. Popular libraries which can be used are the same as mentioned above.
- Train our model: To do this, the first step that we need to do is to collect or generate a dataset. Then we need to train a gesture recognition model on it. These models can be a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), or a 3D CNN.
Integrate with AR: Use the detected gestures to implement interactivity on the data we augment. This interactivity can be anything we want to do with that gesture. It is to be noted that each gesture has to be mapped to an according interaction for proper AR integration.

Implementation using libraries

There exist many libraries which can be used for this task. Details on a few readily used libraries are shared in the table below:

Let us take a deeper look at each of these libraries:

Vuforia: It is a marker-based library that can be directly imported into a Unity project. Since it is marker-based, we need to input every gesture as a marker which the camera would then detect as an image and perform the said activity. This can cause hand gesture detection to be less efficient in comparison to other libraries.
OpenCV: It is a computer vision library that can be directly imported into a Unity project through the Unity Asset Store. Using this library, you can create markerless hand-tracking and gesture-detection models. The set gestures can then be used to implement interactivity. This library provides much flexibility when using your gesture-detection models and allows users to train their models. If you want to read more on hand gesture detection in OpenCV, visit this Educative Answer.
Manomotion: Manomotion is an SDK that can be imported to a Unity project through its package manager. This SDK is tricky to implement, but it provides the user with built-in markerless hand gesture detection features which can be used to map our AR interactions.

Visualization of process

The following visualization will help us better understand how hand gestures work with AR features:

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

How to do hand gesture recognition in augmented reality?

Implementation mindmap

Implementation using libraries

Visualization of process

Conclusion

Library	Engine	AR type
Vuforia	Unity	Marker-based
OpenCV	Unity	Markerless
Manomotion	Unity	Markerless