Security researchers show Google's anti-internet troll AI platform is easily deceived

University of Washington researchers have shown that Google's new machine learning-based system to identify toxic comments in online discussion forums can be bypassed by simply misspelling or adding unnecessary punctuation to abusive words, such as "idiot" or "moron."

Perspective is a project by Google's technology incubator Jigsaw, which uses artificial intelligence to combat internet trolls and promote more civil online discussion by automatically detecting online insults, harassment and abusive speech. The company launched a demonstration website on Feb. 23 that allows anyone to type in a phrase and see its "toxicity score"—a measure of how rude, disrespectful or unreasonable a particular comment is.

In a paper posted Feb. 27 on the e-print repository arXiv, the UW electrical engineers and security experts demonstrated that the early stage technology system can be deceived by using common adversarial tactics. They showed one can subtly modify a phrase that receives a high toxicity score so that it contains the same abusive language but receives a low toxicity score.

Given that news platforms such as The New York Times and other media companies are exploring how the system could help curb harassment and abuse in online comment areas or social media, the UW researchers evaluated Perspective in adversarial settings. They showed that the system is vulnerable to both missing incendiary language and falsely blocking non-abusive phrases.

"Machine learning systems are generally designed to yield the best performance in benign settings. But in real-world applications, these systems are susceptible to intelligent subversion or attacks," said senior author Radha Poovendran, chair of the UW electrical engineering department and director of the Network Security Lab. "We wanted to demonstrate the importance of designing these machine learning tools in adversarial environments. Designing a system with a benign operating environment in mind and deploying it in adversarial environments can have devastating consequences."

To solicit feedback and invite other researchers to explore the strengths and weaknesses of using machine learning as a tool to improve online discussions, Perspective developers made their experiments, models and data publicly available along with the tool itself.

In the examples in Graphic 1 on hot-button topics of climate change, Brexit and the recent U.S. election—which were taken directly from the Perspective API website—the UW team simply misspelled or added extraneous punctuation or spaces to the offending words, which yielded much lower toxicity scores. For example, simply changing "idiot" to "idiiot" reduced the toxicity rate of an otherwise identical phrase from 84 percent to 20 percent.

In the examples in Graphic 2, the researchers also showed that the system does not assign a low toxicity score to a negated version of an abusive phrase.

The researchers also observed that the duplicitous changes often transfer among different phrases—once an intentionally misspelled word was given a low toxicity score in one phrase, it was also given a low score in another phrase. That means an adversary could create a "dictionary" of changes for every word and significantly simplify the attack process.

"There are two metrics for evaluating the performance of a filtering system like a spam blocker or toxic speech detector; one is the missed detection rate, and the other is the false alarm rate," said lead author and UW electrical engineering doctoral student Hossein Hosseini. "Of course scoring the semantic toxicity of a phrase is challenging, but deploying defensive mechanisms both in algorithmic and system levels can help the usability of the system in real-world settings."

The research team suggests several techniques to improve the robustness of toxic speech detectors, including applying a spellchecking filter prior to the detection system, training the machine learning algorithm with adversarial examples and blocking suspicious users for a period of time.

"Our Network Security Lab research is typically focused on the foundations and science of cybersecurity," said Poovendran, the lead principal investigator of a recently awarded MURI grant, of which adversarial machine learning is a significant component. "But our expanded focus includes developing robust and resilient systems for machine learning and reasoning systems that need to operate in adversarial environments for a wide range of applications."

More information: Deceiving Google's Perspective API Built for Detecting Toxic Comments, arXiv:1702.08138 [cs.LG] arxiv.org/abs/1702.08138

Provided by University of Washington

Security researchers show Google's anti-internet troll AI platform is easily deceived

Google rolls out AI tool to combat online trolls

Emulating neurodegeneration and aging in artificial intelligence systems

Holographic displays offer a glimpse into an immersive future

For more open and equitable public discussions on social media, try 'meronymity'

Researchers develop energy-efficient probabilistic computer by combining CMOS with stochastic nanomagnet

New computer vision tool can count damaged buildings in crisis zones and accurately estimate bird flock sizes

Game theory research shows AI can evolve into more selfish or cooperative personalities

Proof-of-principle demonstration of 3D magnetic recording could lead to enhanced hard disk drives

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

Study shows potential of super grids when hurricanes overshadow solar panels

Rubber-like stretchable energy storage device fabricated with laser precision

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

Why can't robots outrun animals?

Virtual sensors help aerial vehicles stay aloft when rotors fail

New insights lead to better next-gen solar cells

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

Going with the flow: Research dives into electrodes on energy storage batteries

Ultra-thin, flexible solar cells demonstrate their promise in a commercial quadcopter drone

Security researchers show Google's anti-internet troll AI platform is easily deceived

Let us know if there is a problem with our content

Thank you for taking time to provide your feedback to the editors

Share article

E-MAIL THE STORY