New tool for virtual and augmented reality uses 'deep learning'

Future systems that allow people to interact with virtual environments will require computers to interpret the human hand's nearly endless variety and complexity of changing motions and joint angles.

In virtual and augmented reality, the user wears a headset that displays the virtual environment as video and images. Whereas augmented reality allows the user to see the real world as well as the virtual world and to interact with both, virtual reality completely immerses the user in the artificial environment.

"In both cases, these systems must be able to see and interpret what the user's hands are doing," said Karthik Ramani, Purdue University's Donald W. Feddersen Professor of Mechanical Engineering and director of the C Design Lab. "If your hands can't interact with the virtual world, you can't do anything. That's why the hands are so important."

A new system, DeepHand, uses a "convolutional neural network" that mimics the human brain and is capable of "deep learning" to understand the hand's nearly endless complexity of joint angles and contortions.

"We figure out where your hands are and where your fingers are and all the motions of the hands and fingers in real time," Ramani said.

A research paper about DeepHand will be presented during CVPR 2016, a computer vision conference in Las Vegas from Sunday (June 26) to July 1.

DeepHand uses a depth-sensing camera to capture the user's hand, and specialized algorithms then interpret hand motions.

"It's called a spatial user interface because you are interfacing with the computer in space instead of on a touch screen or keyboard," Ramani said. "Say the user wants to pick up items from a virtual desktop, drive a virtual car or produce virtual pottery. The hands are obviously key."

The research paper was authored by doctoral students Ayan Sinha and Chiho Choi and Ramani. Information about the paper is available on the C Design Lab Web site at engineering.purdue.edu/cdesign … and-pose-estimation/ .

The researchers "trained" DeepHand with a database of 2.5 million hand poses and configurations. The positions of finger joints are assigned specific "feature vectors" that can be quickly retrieved.

"We identify key angles in the hand, and we look at how these angles change, and these configurations are represented by a set of numbers," Sinha said.

Then, from the database the system selects the ones that best fit what the camera sees.

"The idea is similar to the Netflix algorithm, which is able to select recommended movies for specific customers based on a record of previous movies purchased by that customer," Ramani said.

DeepHand selects "spatial nearest neighbors" that best fit hand positions picked up by the camera. Although training the system requires a large computing power, once the system has been trained it can run on a standard computer.

The research has been supported in part by the National Science Foundation and Purdue's School of Mechanical Engineering.

More information: DeepHand: Robust Hand Pose Estimation by Completing a Matrix Imputed with Deep Features, engineering.purdue.edu/cdesign … /deephand_cvpr16.pdf

Provided by Purdue University