December 28, 2021
Simple, accurate, and efficient: Improving the way computers recognize hand gestures
In the 2002 science fiction blockbuster film "Minority Report," Tom Cruise's character John Anderton uses his hands, sheathed in special gloves, to interface with his wall-sized transparent computer screen. The computer recognizes his gestures to enlarge, zoom in, and swipe away. Although this futuristic vision for computer-human interaction is now 20 years old, today's humans still interface with computers by using a mouse, keyboard, remote control, or small touch screen. However, much effort has been devoted by researchers to unlock more natural forms of communication without requiring contact between the user and the device. Voice commands are a prominent example that have found their way into modern smartphones and virtual assistants, letting us interact and control devices through speech.
Hand gestures constitute another important mode of human communication that could be adopted for human-computer interactions. Recent progress in camera systems, image analysis and machine learning have made optical-based gesture recognition a more attractive option in most contexts than approaches relying on wearable sensors or data gloves, as used by Anderton in "Minority Report." However, current methods are hindered by a variety of limitations, including high computational complexity, low speed, poor accuracy, or a low number of recognizable gestures. To tackle these issues, a team led by Zhiyi Yu of Sun Yat-sen University, China, recently developed a new hand gesture recognition algorithm that strikes a good balance between complexity, accuracy, and applicability. As detailed in their paper, which was published in the Journal of Electronic Imaging, the team adopted innovative strategies to overcome key challenges and realize an algorithm that can be easily applied in consumer-level devices.
One of the main features of the algorithm is adaptability to different hand types. The algorithm first tries to classify the hand type of the user as either slim, normal, or broad based on three measurements accounting for relationships between palm width, palm length, and finger length. If this classification is successful, subsequent steps in the hand gesture recognition process only compare the input gesture with stored samples of the same hand type. "Traditional simple algorithms tend to suffer from low recognition rates because they cannot cope with different hand types. By first classifying the input gesture by hand type and then using sample libraries that match this type, we can improve the overall recognition rate with almost negligible resource consumption," explains Yu.
Another key aspect of the team's method is the use of a "shortcut feature" to perform a prerecognition step. While the recognition algorithm is capable of identifying an input gesture out of nine possible gestures, comparing all the features of the input gesture with those of the stored samples for all possible gestures would be very time consuming. To solve this problem, the prerecognition step calculates a ratio of the area of the hand to select the three most likely gestures of the possible nine. This simple feature is enough to narrow down the number of candidate gestures to three, out of which the final gesture is decided using a much more complex and high-precision feature extraction based on "Hu invariant moments." Yu says, "The gesture prerecognition step not only reduces the number of calculations and hardware resources required but also improves recognition speed without compromising accuracy."
The team tested their algorithm both in a commercial PC processor and an FPGA platform using an USB camera. They had 40 volunteers make the nine hand gestures multiple times to build up the sample library, and another 40 volunteers to determine the accuracy of the system. Overall, the results showed that the proposed approach could recognize hand gestures in real time with an accuracy exceeding 93%, even if the input gesture images were rotated, translated, or scaled. According to the researchers, future work will focus on improving the performance of the algorithm under poor lightning conditions and increasing the number of possible gestures.
Gesture recognition has many promising fields of application and could pave the way to new ways of controlling electronic devices. A revolution in human-computer interaction might be close at hand!