Fooling the human via changes to images

Fooling the human via changes to images
Credit: OpenAI

Well, so much for an assumption that now sounds too easy to accept—that the magnificent human brain has it over a machine any day. Really? Do we interpret the world more accurately than a "convolutional neural network" can?

As Even Ackerman pointed out, "when a CNN [convolutional neural network] is presented with an image, it's looking at a static grid of rectangular pixels."

We look at images and see them correctly, such as humans and animals; CNNs look at things more like computers.

A research team is raising questions about easy assumptions, however. They are exploring what happens with adversarial examples with regard to humans.

Inputs to machine learning models designed to cause the models to make a mistake are "adversarial examples." Adversarial examples, as such, could potentially be dangerous.

Simply put, "Adversarial examples are malicious inputs designed to fool machine learning models," according to a Google Research page.

As a blog posting in OpenAI explained, attackers could target autonomous vehicles by using stickers or paint to create an adversarial stop sign that the vehicle would interpret as a 'yield' or other sign.

The researchers, in talking about machine learning models as vulnerable to adversarial examples, noted that small changes to images can cause computer vision models to make mistakes, such as identifying a school bus as an ostrich.

The blog from OpenAI referred to adversarial examples as representing a concrete problem in AI safety.

Having said that, what about adversarial examples fooling humans? Can that happen?

The team, said Even Ackerman in IEEE Spectrum, "decided to try and figure out whether the same techniques that fool can also fool the biological neural networks inside of our heads."

The research paper describing their work is "Adversarial Examples that Fool both Human and Computer Vision," on arXiv.

"Here, we create the first adversarial examples designed to fool humans," they wrote. They found that "adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers." (Ackerman noted that in the study, people only had between 60 and 70 milliseconds to look at each image and make a decision.)

IEEE Spectrum's Even Ackerman discussed what they did and presented a set of two images from Google Brain to support his explanation.

Ackerman showed "a picture of a cat on the left. On the right, can you tell whether it's a picture of the same cat, or a picture of a similar looking dog? The difference between the two pictures is that the one on the right has been tweaked a bit by an algorithm to make it difficult for a type of computer model called a convolutional neural network (CNN) to be able to tell what it really is. In this case, the CNN thinks it's looking at a dog rather than a cat, but what's remarkable is that most people think the same thing."

What? How can humans make the same mistake? Ackerman said it might be possible to target the development of an adversarial image at humans "by choosing models that match the human visual system as closely as possible."

But what exactly is messing with the human's ability to be correct? Ackerman said the researchers pointed out that "our adversarial examples are designed to fool human perception, so we should be careful using subjective perception to understand how they work."

He said they were willing to make some generalizations "about a few different categories of modifications, including 'disrupting object edges, especially by mid-frequency modulations perpendicular to the edge; enhancing edges both by increasing contrast and creating texture boundaries; modifying texture; and taking advantage of dark regions in the image, where the perceptual magnitude of small perturbations can be larger.'"

How they tested: Subjects with normal or corrected vision participated in the experiment.

"For each group, a successful adversarial image was able to fool people into choosing the wrong member of the group, by identifying it as a dog when it's actually a cat, or vice versa," Ackerman said.

Subjects were asked to classify images that appeared on the screen by pressing buttons on a response time box, said the authors.

Ackerman wrote, "The short amount of time that the image was shown mitigated the difference between how CNNs perceive the world and how humans do."

The experiment involved three groups of images: pets (cats and dogs), vegetables (cabbages and broccoli), and "hazard" (spiders and snakes).

Ackerman's comment on the research findings was that "there's overlap between the perceptual manipulation of CNNs and the manipulation of humans. It means that machine learning techniques could potentially be used to subtly alter things like pictures or videos in a way that could change our perception of (and reaction to) them without us ever realizing what was going on."

He added that "we'll have to be careful, and keep in mind that just like those computers, sometimes we're far too easy to fool."

"Adversarial Examples that Fool both Human and Computer Vision" is by Gamaleldin F. Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goodfellow, and Jascha Sohl-Dickstein, on arXiv.


Explore further

When is a baseball espresso? Neural network tricked and it is no joke

More information: Adversarial Examples that Fool both Human and Computer Vision, arxiv.org/abs/1802.08195

© 2018 Tech Xplore

Citation: Fooling the human via changes to images (2018, March 3) retrieved 20 November 2018 from https://techxplore.com/news/2018-03-human-images.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
77 shares

Feedback to editors

User comments

Mar 03, 2018
Note that 60-70 ms is pushing human perception speed, not its accuracy. Persistence of vision takes place around 80 ms and anything going faster than that is not exactly cleary percieved because of the brain's speed limitations. As a rule of thumb, the brain integrates information at roughly 10 Hz (100ms) so demanding faster reactions is necessarily going to sacrifice accuracy.

Meanwhile the neural network models have all the time in the world to analyze their input.

Mar 03, 2018
Nature abounds with examples of these attacks, they are commonly called camouflage.
The idea of stopping an autonomous vehicle just by putting up a stop sign may entertain limitless numbers of children in the future, let's us keep quiet about it now and not spoil their future fun.

Mar 03, 2018
"Nature abounds with examples of these attacks, they are commonly called camouflage."


Camouflage is a slightly different proposition to the kind of faults that CNNs exhibit. When things genuinely look like other things, like a stick insect looking like a stick, there is no mis-interpretation - it does look like the object it's pretending to look like. Crucial information is being obscured, so the false signal is stronger.

The CNNs fail in a different way. They lock onto the weaker signal such that a pattern on the shield of a turtle triggers the CNN to return false positive for e.g. "aubergine" and ignores the stronger signal of the turtle - the false signal can be entirely and categorically false, something which humans would recognize as false because it doesn't even fit the scene - like looking at a tree and going "Hmmm... the taste of strawberries!"


Mar 03, 2018
Part of the reason the neural network fails is because of how much demand is put on such a little thing. The training of the network is essentially an evolutionary algorithm that rewards or punishes the network based on how it performs, so as long as the network passes the criteria imposed, it doesn't matter how it does it.

So, the network fixes on some small set of cues, and misses the obvious. After all, it does not need to care if a turtle looks like a turtle - it merely needs to check if a pixel in square A3 correlates with a pixel in square B9 in a specific way, or however the pixels may be rotated or shifted.

The network always passes the test the easiest way, which is the "wrong" way. With its limited capacity, it can adapt to any sufficiently limited test put upon it, so the problem is giving it a test which is as complex and demanding as the reality it's supposed to face.

That pretty much requires "training on the job", which is currently not possible.

Mar 04, 2018
It seems like humans misidentify things all the time, especially in fast processing situations, missing or misinterpreting a road sign, misreading a word in a line of text as being a completely different word, misreading a look someone gives you, police officers seeing someone with a weapon when they don't have one...

People are poor processors in some situations.

Mar 04, 2018
The authors miss that the eye processes most information before it reaches the brain. It's the brain's role to set the image in context, not the eye.

Mar 05, 2018
Interesting that right here we have a very good example of how highly intelligent agents have spend decades developing a sophisticated pattern recognition system using highly abstract algorithms and yet people still want the world to believe that the superior human brain got created via random mutations and natural selection. (I use superior here in the general sense - able to do much more than just singular pattern recognition).

Please, people, use some commonsense! Just a very small example - memory and recall are functions that require logic and foreknowledge. Something abstract that just cannot come from random, purely materialistic processes. Interpretation lies on an even higher plane and is even more abstract. Where and how did that arise via purely random processes?

Mar 06, 2018
Didn't we already know we could do this with the whole cat going up or down the stairs and what colour is the dress (blue or gold) and we also suffer from stuff that moves too much as our brains work in time slices (left eye then right eye) so we can easilly miss detail.

I know one of the best illusions is where we have people passing a ball and nearly everyone misses the guy in the gorrilla suit walking right through the whole set. You either watch were the ball is or you see the gorilla but not both.

Laughably we may be worse than the computer at mistaking images for something else.

Mar 07, 2018
"Where and how did that arise via purely random processes?"


Necessity is the mother of inventions. If you've got a round hole and a square peg, eventually the peg gets worn round by slamming against the hole, and the hole gets broached a little bit more square - and then everyone afterwards marvels how remarkably convenient it is that the peg is shaped just so it can fit through the hole - as if by design.

Nature and "materialistic processes" are much smarter than you give them credit. It's just not a conscious type of smarts.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more