April 21, 2022

Researchers develop a method to keep bots from using toxic language

Cleaning up online bots' act – and speech — Researchers at the University of California San Diego have developed algorithms to rid speech generated by online bots of offensive language, on social media and elsewhere. Credit: University of California San Diego

Researchers at the University of California San Diego have developed algorithms to rid speech generated by online bots of offensive language, on social media and elsewhere.

Chatbots using toxic language is an ongoing issue. But perhaps the most famous example is Tay, a Twitter chatbot unveiled by Microsoft in March 2016. In less than 24 hours, Tay, which was learning from conversations happening on Twitter, started repeating some of the most offensive utterances tweeted at the bot, including racist and misogynist statements.

The issue is that chatbots are often trained to repeat their interlocutors' statements during a conversation. In addition, the bots are trained on huge amounts of text, which often contain toxic language and tend to be biased;certain groups of people are overrepresented in the training set and the bot learns language representative of that group only. An example is a bot producing negative statements about a country, propagating bias because it's learning from a training set where people have a negative view of that country.

"Industry is trying to push the limits of language models," said UC San Diego computer science Ph.D. student Canwen Xu, the paper's first author. "As researchers, we are comprehensively considering the social impact of language models and addressing concerns."

Researchers and industry professionals have tried several approaches to clean up bots' speech–all with little success. Creating a list of toxic words misses words that when used in isolation are not toxic, but become offensive when used in combination with others. Trying to remove toxic speech from training data is time consuming and far from foolproof. Developing a neural network that would identify toxic speech has similar issues.

Instead, the UC San Diego team of computer scientists first fed toxic prompts to a pre-trained language model to get it to generate toxic content. Researchers then trained the model to predict the likelihood that content would be toxic. They call this their "evil model." They then trained a "good model," which was taught to avoid all the content highly ranked by the "evil model."

They verified that their good model did as well as state-of-the-art methods–detoxifying speech by as much as 23 percent.

They presented their work at the AAAI Conference on Artificial Intelligence held online in March 2022.

Researchers were able to develop this solution because their work spans a wide range of expertise, said Julian McAuley, a professor in the UC San Diego Department of Computer Science and Engineering and the paper's senior author.

"Our lab has expertise in algorithmic language, in natural language processing and in algorithmic de-biasing," he said. "This problem and our solution lie at the intersection of all these topics."

However, this language model still has shortcomings. For example, the bot now shies away from discussions of under-represented groups, because the topic is often associated with hate speech and toxic content. Researchers plan to focus on this problem in future work.

"We want to make a language model that is friendlier to different groups of people," said computer science Ph.D. student Zexue He, one of the paper's co-authors.

The work has applications in areas other than chatbots, said computer science Ph.D. student and paper co-author Zhankui He. It could, for example, also be useful in diversifying and detoxifying recommendation systems.

More information: Leashing the Inner Demons: Self-Detoxification for Language Models, arXiv:2203.03072 [cs.CL] arxiv.org/abs/2203.03072

Provided by University of California - San Diego

Citation: Researchers develop a method to keep bots from using toxic language (2022, April 21) retrieved 17 July 2024 from https://techxplore.com/news/2022-04-method-bots-toxic-language.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

How to 'detox' potentially offensive language from an AI

104 shares

Feedback to editors

The magnet trick: New invention makes vibrations disappear

34 minutes ago

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

1 hour ago

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

1 hour ago

Scientists bridge the 'valley of death' in carbon capture technologies

1 hour ago

Flexible electronics researchers develop a completely stretchy lithium-ion battery

4 hours ago

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

6 hours ago

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

21 hours ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

23 hours ago

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Jul 16, 2024

Large language models make human-like reasoning mistakes, researchers find

Jul 16, 2024

Load comments (0)

Researchers develop a method to keep bots from using toxic language

The magnet trick: New invention makes vibrations disappear

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

Scientists bridge the 'valley of death' in carbon capture technologies

Flexible electronics researchers develop a completely stretchy lithium-ion battery

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

How to 'detox' potentially offensive language from an AI

How figurative language confuses chatbots

Context reduces racial bias in hate speech detection algorithms

New AI brings the power of natural language processing to African languages

Artificial intelligence favors white men under 40

Study finds racial bias in tweets flagged as hate speech

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

New system enables intuitive teleoperation of a robotic manipulator in real-time

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

Phys.org

Medical Xpress

Science X

Researchers develop a method to keep bots from using toxic language

The magnet trick: New invention makes vibrations disappear

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

Scientists bridge the 'valley of death' in carbon capture technologies

Flexible electronics researchers develop a completely stretchy lithium-ion battery

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Related Stories

How to 'detox' potentially offensive language from an AI

How figurative language confuses chatbots

Context reduces racial bias in hate speech detection algorithms

New AI brings the power of natural language processing to African languages

Artificial intelligence favors white men under 40

Study finds racial bias in tweets flagged as hate speech

Recommended for you

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

New system enables intuitive teleoperation of a robotic manipulator in real-time

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

Your Privacy