August 1, 2016
Microsoft Pix gives the iPhone camera an artificial brain
As a professional photographer shooting the Seattle Seahawks, Josh Weisberg dashed around the football field, constantly taking pictures in anticipation of the perfect shot.
"If you see it happen and you aren't already taking pictures, it's too late, you missed the shot," says Weisberg, who is a principal program manager in the Computational Photography Group within Microsoft's research organization. "You might take 10 pictures of nothing but get this one amazing photo of Russell Wilson throwing the ball."
We all want that amazing photo – of our kids, our friends, ourselves, and we want to take it with our smartphone. To get it, most of us click the shutter and hope for the best. All too often, the result is disappointment: eyes shut, faces blurred, heads turned, no smiles.
Microsoft Pix captures a burst of 10 frames with each shutter click – some from before the tap – and uses artificial intelligence to select up to three of the best, unique shots. Before the remaining frames are deleted, the app uses data from the entire burst to remove noise, and then intelligently brightens faces, beautifies skin and adjusts the picture's color and tone. These best, enhanced images are ready in about a second.
While the app is selecting and enhancing the best of the burst, another set of algorithms sorts through the frames to determine whether any motion would make for an interesting looping video within the image, such as a person's hair tussled by the wind or the cascade of a waterfall in the background. If so, the app will loop the specific motion for a Harry Potter-esque effect called Live Image.
Weisberg says his motivation for the app grew from his wife's frustration over the quality of the photos she took of their kids with her iPhone. That got Weisberg thinking.
"Given all that the phone knows about a user, why can't it take better photos?" he recalls thinking.
Well aware of the computational photography expertise within Microsoft's research organization, Weisberg saw a way to "bring a lot of extra value" to smartphone photography.
Easy to use
The interface for Microsoft Pix is intentionally simple – no modes or settings to select. The app's smarts operate behind the scenes, helping users take better photos with a minimalist set of tools.
"They are building this for people who aren't photographers but who like to take pictures—and would like to take better pictures—but don't want to take the time to learn what goes in to making better pictures," says Reed Hoffman, a Kansas City-based photography consultant and instructor with the Nikon School of Photography who tested beta versions of Microsoft Pix.
The app starts capturing frames the moment the viewfinder is loaded and intelligently tweaks settings for exposure and focus. If faces are detected, the entire process is optimized around them.
"We think that people are the most important subject in the photographs you take," Weisberg says. One of the algorithms running in the background, for example, is trained to detect whether eyes are open or closed. Images from the burst of frames with open eyes are ranked higher.
The open-eye detector builds on a legacy of facial recognition technology that most smartphone users know from the boxes that cameras put around faces in the viewfinder, explains Neel Joshi from Microsoft Research's graphics group. For Microsoft Pix, he and colleagues narrowed the focus of facial-recognition algorithms to look at specific properties of the face, such as whether the eyes are open or closed.
"There is this evolutionary process where the whole community – people here, people in academia – are building on top of what everyone built before," Joshi says. "There is a progression. A lot of the work, a lot of the intelligence, that is in Pix is really building on a long train of research at Microsoft."
Live Image, for example, stems from a curiosity-driven research project Joshi spearheaded after the first still photos with subtle motion called "cinemegraphs" hit the internet in 2011. Those images, he notes, were made with a camera on a tripod and carefully edited in Photoshop.
"We wanted to figure out an easier way," he says.
The solution involves algorithms that smooth out the effects of shaky hands to eliminate the need for a tripod, as well as a set of tools that help users isolate regions in the sequence of frames with interesting motion to loop. This technology was described in a 2012 research paper and released as an app called Cliplets.
Researchers in the graphics group subsequently developed algorithms to automate the video looping, which were described in papers published in 2013 and 2015.
The video looping automation is the basis for the code in Live Image, but it ran too slowly for a dedicated smartphone photography app, notes Joshi.
To speed up the process, he built classifiers that selectively determine when to create a Live Image, which save processing power and time by only producing what the algorithms consider to be appealing loops. Repetitive, periodic motions such as twiddling thumbs and flapping flags loop well, but a hand thrust in one direction fails the test, for example. The classifier detects loop candidates in 50 milliseconds, and Live Images are processed within two seconds of the shutter tap.
Microsoft Pix also ships with Microsoft Hyperlapse for mobile, a technology that Joshi developed in 2015 to smooth as well as speed up video into visually appealing time-lapses. Now available on iOS, Microsoft Hyperlapse is already available for Android and Windows Phone users. It employs image stabilization algorithms and intelligently selects which frames to keep for optimal flow in time lapses.
The effect "makes boring long videos more fun to watch, less nauseating and easier to share," Joshi says.
Microsoft Pix is integrated with the iPhone camera roll and works on older iPhone models including the 5S – so there's no need to upgrade your phone simply to take better pictures, Weisberg notes.
If he went back to shooting professional football games, Weisberg would understandably still want his big camera and lenses. But for a day out with family and friends, he's now content with the smartphone in his pocket.
"Having those 10 frames – some before I even tap the shutter – and having the best one picked out for me automatically just increases the probability that I'm going to get a better photo," he says.