MIT CSAIL: Revealing hidden video from shadows
A team of researchers showed they can recover a video of motion taking place in a hidden scene by observing changes in illumination in a nearby visible region. They looked at the indirect effect on shadows and shading in an observed region.
Translation: Playing with shadows for information can be seriously rewarding. The team of researchers created a new AI algorithm that can help cameras "see" off-camera things using only moving shadows.
Their method can reconstruct a hidden video based on the shadows it casts. The result is that you can estimate what the hidden video looks like.
Hillary Grigonis in Digital Trends wrote about their research with an interesting comparison—like "reading shadow puppets in reverse." How so? "...the computer sees the bunny-shaped shadow and is then able to create an estimate of the object that created that shadow. The computer doesn't know what that object is, but can provide a rough outline of the shape."
Starting out, they were interested in solving the problem of activity taking place outside their field of view.
The authors considered the value of their research: "We have shown that cluttered scenes can be computationally turned into low-resolution mirrors without prior calibration." With just a single input video of the visible scene, they could recover a latent video of the hidden scene as well as a light transport matrix.
"We find it remarkable," they said, "that merely asking for latent factors easily expressible by a CNN [convolutional neural network] is sufficient to solve our problem, allowing us to entirely bypass challenges such as the estimation of the geometry and reflectance properties of the scene."
Posted on Dec. 6, their video is titled "Computational Mirrors: Revealing Hidden Video." Michael Zhang in PetaPixel summed up what they did in the video. "Scientists at MIT's CSAIL share how they pointed a camera at a pile of objects and then filmed the shadows created on those objects by a person moving around off-camera."
The video captions further pointed out that their method can also reconstruct the silhouette of a live-action performance from its shadows. Results at least cover the color and motion. Zhang assessed what they were able to do. "The AI analyzed the shadows and was able to reconstruct a blurry but strikingly accurate video of what the person was doing with their [sic] hands."
Potential applications? Video notes: "With further refinement, this method could allow self-driving cars to detect hidden obstacles.
Rachel Gordon, MIT CSAIL, spoke of other possibilities: elder-care centers looking out for the safety of their residents; search-and-rescue teams making use of this when having to navigate dangerous and obstructed areas.
All in all, the researchers have taken an interesting path toward grasping information beyond line of sight but others at MIT have in a sense been there, done that. Scenes outside a normal line of sight were the focus of MIT researchers seven years ago, said CSAIL's Gordon, and they then used lasers to produce 3-D images.
In the latest research effort, however, the team wanted to see what they could achieve using no special equipment. Gordon quoted the lead researcher on this. Miika Aittala, who said, "You can achieve quite a bit with non-line-of-sight imaging equipment like lasers, but in our approach you only have access to the light that's naturally reaching the camera, and you try to make the most out of the scarce information in it."
Think unscramble. The challenge was to unscramble and make sense of these lighting cues. Think algorithm. Gordon wrote that the team focused on breaking the ambiguity by specifying algorithmically that they wanted a 'scrambling' pattern that corresponds to plausible real-world shadowing and shading, to uncover the hidden video that looks like it has edges and objects that move coherently.
She explained that their algorithm trains two neural networks simultaneously. "One network produces the scrambling pattern, and the other estimates the hidden video. The networks are rewarded when the combination of these two factors reproduce the video recorded from the clutter, driving them to explain the observations with plausible hidden data."
Their paper discussing their work is called "Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization," and it is on arXiv. Authors are Miika Aittala, Prafull Sharma, Lukas Murmann, Adam Yedidia, Gregory Wornell, William T. Freeman and Frédo Durand.
Reports said they would be presenting their work at the Conference on Neural Information Processing Systems (NeurIPS 2019) in Vancouver, British Columbia.
More information: Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization, arXiv:1912.02314 [cs.CV] https://arxiv.org/abs/1912.02314
© 2019 Science X Network