February 12, 2020

Bridging the gap between human and machine vision

by Kris Brewer, Massachusetts Institute of Technology

Suppose you look briefly from a few feet away at a person you have never met before. Step back a few paces and look again. Will you be able to recognize her face? "Yes, of course," you probably are thinking. If this is true, it would mean that our visual system, having seen a single image of an object such as a specific face, recognizes it robustly despite changes to the object's position and scale, for example. On the other hand, we know that state-of-the-art classifiers, such as vanilla deep networks, will fail this simple test.

In order to recognize a specific face under a range of transformations, neural networks need to be trained with many examples of the face under the different conditions. In other words, they can achieve invariance through memorization, but cannot do it if only one image is available. Thus, understanding how human vision can pull off this remarkable feat is relevant for engineers aiming to improve their existing classifiers. It also is important for neuroscientists modeling the primate visual system with deep networks. In particular, it is possible that the invariance with one-shot learning exhibited by biological vision requires a rather different computational strategy than that of deep networks.

A new paper by MIT Ph.D. candidate in electrical engineering and computer science Yena Han and colleagues in Nature Scientific Reports, titled "Scale and translation-invariance for novel objects in human vision," discusses how they study this phenomenon more carefully to create novel biologically inspired networks.

"Humans can learn from very few examples, unlike deep networks. This is a huge difference with vast implications for engineering of vision systems and for understanding how human vision really works," states co-author Tomaso Poggio—director of the Center for Brains, Minds and Machines (CBMM) and the Eugene McDermott Professor of Brain and Cognitive Sciences at MIT. "A key reason for this difference is the relative invariance of the primate visual system to scale, shift, and other transformations. Strangely, this has been mostly neglected in the AI community, in part because the psychophysical data were so far less than clear-cut. Han's work has now established solid measurements of basic invariances of human vision."

To differentiate invariance rising from intrinsic computation with that from experience and memorization, the new study measured the range of invariance in one-shot learning. A one-shot learning task was performed by presenting Korean letter stimuli to human subjects who were unfamiliar with the language. These letters were initially presented a single time under one specific condition and tested at different scales or positions than the original condition. The first experimental result is that—just as you guessed—humans showed significant scale-invariant recognition after only a single exposure to these novel objects. The second result is that the range of position-invariance is limited, depending on the size and placement of objects.

Next, Han and her colleagues performed a comparable experiment in deep neural networks designed to reproduce this human performance. The results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance. In addition, limited position-invariance of human vision is better replicated in the network by having the model neurons' receptive fields increase as they are further from the center of the visual field. This architecture is different from commonly used neural network models, where an image is processed under uniform resolution with the same shared filters.

"Our work provides a new understanding of the brain representation of objects under different viewpoints. It also has implications for AI, as the results provide new insights into what is a good architectural design for deep neural networks," remarks Han, CBMM researcher and lead author of the study.

More information: Yena Han et al. Scale and translation-invariance for novel objects in human vision, Scientific Reports (2020). DOI: 10.1038/s41598-019-57261-6

Journal information: Scientific Reports

Provided by Massachusetts Institute of Technology

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation: Bridging the gap between human and machine vision (2020, February 12) retrieved 25 April 2024 from https://techxplore.com/news/2020-02-bridging-gap-human-machine-vision.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Differences between deep neural networks and human perception

60 shares

Feedback to editors

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

7 hours ago

Study shows potential of super grids when hurricanes overshadow solar panels

8 hours ago

Rubber-like stretchable energy storage device fabricated with laser precision

8 hours ago

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

8 hours ago

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

10 hours ago

Why can't robots outrun animals?

10 hours ago

Virtual sensors help aerial vehicles stay aloft when rotors fail

10 hours ago

New insights lead to better next-gen solar cells

11 hours ago

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

11 hours ago

Going with the flow: Research dives into electrodes on energy storage batteries

11 hours ago

Load comments (0)

Bridging the gap between human and machine vision

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

Study shows potential of super grids when hurricanes overshadow solar panels

Rubber-like stretchable energy storage device fabricated with laser precision

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

Why can't robots outrun animals?

Virtual sensors help aerial vehicles stay aloft when rotors fail

New insights lead to better next-gen solar cells

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

Going with the flow: Research dives into electrodes on energy storage batteries

Differences between deep neural networks and human perception

For better deep neural network vision, just add feedback (loops)

'Number sense' arises from the recognition of visible objects

Research identifies key weakness in modern computer vision systems

New framework improves performance of deep neural networks

Machines that learn like people

Emulating neurodegeneration and aging in artificial intelligence systems

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

Microsoft claims that small, localized language models can be powerful as well

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

A new framework to generate human motions from language prompts

Phys.org

Medical Xpress

Science X

Bridging the gap between human and machine vision

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

Study shows potential of super grids when hurricanes overshadow solar panels

Rubber-like stretchable energy storage device fabricated with laser precision

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

Why can't robots outrun animals?

Virtual sensors help aerial vehicles stay aloft when rotors fail

New insights lead to better next-gen solar cells

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

Going with the flow: Research dives into electrodes on energy storage batteries

Related Stories

Differences between deep neural networks and human perception

For better deep neural network vision, just add feedback (loops)

'Number sense' arises from the recognition of visible objects

Research identifies key weakness in modern computer vision systems

New framework improves performance of deep neural networks

Machines that learn like people

Recommended for you

Emulating neurodegeneration and aging in artificial intelligence systems

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

Microsoft claims that small, localized language models can be powerful as well

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

A new framework to generate human motions from language prompts

Your Privacy