Do deep networks 'see' as well as humans?
A new study from the Centre for Neuroscience (CNS) at the Indian Institute of Science (IISc) explores how well deep neural networks compare to the human brain when it comes to visual perception.
Deep neural networks are machine learning systems inspired by the network of brain cells or neurons in the human brain, which can be trained to perform specific tasks. These networks have played a pivotal role in helping scientists understand how our brains perceive the things we see. Although deep networks have evolved significantly over the past decade, they are still nowhere close to performing as well as the human brain in perceiving visual cues.
In a recent study, SP Arun, Associate Professor at CNS, and his team have compared various qualitative properties of these deep networks with those of the human brain. Deep networks, although a good model for understanding how the human brain visualizes objects, work differently from the latter. While complex computation is trivial for them, certain tasks that are relatively easy for humans can be difficult for these networks to complete. In the current study, published in Nature Communications, Arun and his team attempted to understand which visual tasks can be performed by these networks naturally by virtue of their architecture, and which require further training.
The team studied 13 different perceptual effects and uncovered previously unknown qualitative differences between deep networks and the human brain. An example is the Thatcher effect, a phenomenon where humans find it easier to recognize local feature changes in an upright image, but this becomes difficult when the image is flipped upside-down. Deep networks trained to recognize upright faces showed a Thatcher effect when compared with networks trained to recognize objects. Another visual property of the human brain, called mirror confusion, was tested on these networks. To humans, mirror reflections along the vertical axis appear more similar than those along the horizontal axis. The researchers found that deep networks also show stronger mirror confusion for vertical compared to horizontally reflected images.
Another phenomenon peculiar to the human brain is that it focuses on coarser details first. This is known as the global advantage effect. For example, in an image of a tree, our brain would first see the tree as a whole before noticing the details of the leaves in it. Similarly, when presented with an image of a face, humans first look at the face as a whole, and then focus on finer details like the eyes, nose, mouth and so on, explains Georgin Jacob, first author and Ph.D. student at CNS. "Surprisingly, neural networks showed a local advantage," he says. This means that unlike the brain, the networks focus on the finer details of an image first. Therefore, even though these neural networks and the human brain carry out the same object recognition tasks, the steps followed by the two are very different.
"Lots of studies have been showing similarities between deep networks and brains, but no one has really looked at systematic differences," says Arun, who is the senior author of the study. Identifying these differences can push us closer to making these networks more brain-like.
Such analyses can help researchers build more robust neural networks that not only perform better but are also immune to "adversarial attacks" that aim to derail them.
More information: Georgin Jacob et al. Qualitative similarities and differences in visual object representations between brains and deep networks, Nature Communications (2021). DOI: 10.1038/s41467-021-22078-3