Augmented reality (AR) is a technology that relays digital information in the real world. This information can be a 3D model, an image, a video, or even data. Such functionality enables users to interact with digital assets in the physical world. This interaction can be done using hand gestures. To implement this type of interactivity, we need to use computer vision models and ensure that we can incorporate these models alongside the AR functionality. Before discussing the libraries which can be used to implement this, let us look at the general steps needed.
When developing any software, the first step is having a clear mindmap of the steps that should be followed. The following image shows a very simple mindmap that you can follow to understand the process:
Now we will look at the mindmap and discuss it in detail:
Choose a library: We can use several libraries to implement hand gesture recognition. A few popular ones include Vuforia, OpenCV, and Manomotion.
Choose a hand gesture recognition model: We can take two approaches. One is to use a pre-trained model, and the other is to train a model by ourselves and then use it. Let us discuss both these options:
Pre-trained model: Many pre-trained models are available in libraries. Popular libraries which can be used are the same as mentioned above.
Train our model: To do this, the first step that we need to do is to collect or generate a dataset. Then we need to train a gesture recognition model on it. These models can be a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), or a 3D CNN.
Integrate with AR: Use the detected gestures to implement interactivity on the data we augment. This interactivity can be anything we want to do with that gesture. It is to be noted that each gesture has to be mapped to an according interaction for proper AR integration.
There exist many libraries which can be used for this task. Details on a few readily used libraries are shared in the table below:
Library | Engine | AR type |
Vuforia | Unity | Marker-based |
OpenCV | Unity | Markerless |
Manomotion | Unity | Markerless |
Let us take a deeper look at each of these libraries:
Vuforia: It is a marker-based library that can be directly imported into a Unity project. Since it is marker-based, we need to input every gesture as a marker which the camera would then detect as an image and perform the said activity. This can cause hand gesture detection to be less efficient in comparison to other libraries.
OpenCV: It is a computer vision library that can be directly imported into a Unity project through the Unity Asset Store. Using this library, you can create markerless hand-tracking and gesture-detection models. The set gestures can then be used to implement interactivity. This library provides much flexibility when using your gesture-detection models and allows users to train their models. If you want to read more on hand gesture detection in OpenCV, visit this Educative Answer.
Manomotion: Manomotion is an SDK that can be imported to a Unity project through its package manager. This SDK is tricky to implement, but it provides the user with built-in markerless hand gesture detection features which can be used to map our AR interactions.
The following visualization will help us better understand how hand gestures work with AR features:
We have seen a simple mindmap we can follow when implementing hand gesture recognition in AR and the different variations we can use. We also looked into a few libraries that can be used for hand gesture detection and how they vary. This task can be achieved using any method listed above, and it is up to the user. To recap and jog our memory, let’s conclude this Educative Answer with a brief question.
Which of the following is a marker-based library?
OpenCV
Vuforia
Manomotion
Free Resources