Microsoft Research at SIGGRAPH 14 made news this week with its presentation of how to turn a regular video camera into a depth camera. "Learning to be a depth camera for close-range human capture and interaction" was presented by the research team, which represented Microsoft Research and the iCub facility at the Italian Institute of Technology.(Istituto Italiano di Tecnologia). In brief, as described in MIT Technology Review, "Microsoft researchers say simple hardware changes and machine learning techniques let a regular smartphone camera act as a depth sensor." The team focused on hands and faces; they modified the camera on a smartphone to capture hands and faces depth.

"We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modifications. Our approach targets close-range human capture and interaction where dense 3D estimation of hands and faces is desired," said the authors. They used hybrid classification-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in realtime. As their video shows, they illustrated the results with varied human-computer interactions. One of the contributions their paper makes, they stated, is that "We demonstrate a new technique for turning a cheap color or monochrome camera into a depth sensor, for close-range human capture and interaction." After all, they said, "Whilst depth cameras are becoming more of a commodity, they have yet to (and arguably will never) surpass the ubiquity of regular 2D cameras, which are now used in the majority of our mobile devices and desktop computers."

The researchers also said, "Our hope is to allow practitioners to more rapidly prototype depth-based applications in a variety of new contexts." Two hardware designs for depth sensing were presented: a modified web camera for desktop depth sensing, in addition to a modified cellphone camera for mobile applications. They demonstrated efficient and accurate hand and face tracking in both scenarios.

Describing how their approach worked, they said they devised an algorithm that learns the correlation between pixel intensities and absolute depth measurements. The algorithm is implemented on conventional color or monochrome cameras. The hardware modifications required were removal of any near infrared (NIR) cut filter (typically used in regular RGB sensors), and the addition of a bandpass filter and low-cost LEDs, both operating in a narrow NIR range.

Nonetheless, the authors stated that the method they described in the technical paper is not for a general-purpose depth . "Whilst this method cannot replace commodity depth sensors for general use, our hope is that it will enable 3D face and hand sensing and interactive systems in novel contexts,"