February 23, 2024

Diversifying data to beat bias

by Caitlin Dawson and Anna Hsu, University of Southern California

AI holds the potential to revolutionize health care, but it also brings with it a significant challenge: bias. For instance, a dermatologist might use an AI-driven system to help identify suspicious moles. But what if the machine learning model was trained primarily on image data from lighter skin tones, and misses a common form of skin cancer on a darker-skinned patient?

This is a real-world problem. In 2021, researchers found that free image databases that could be used to train AI systems to diagnose skin cancer contain very few images of people with darker skin. It turns out, AI is only as good as its data, and biased data can lead to serious outcomes, including unnecessary surgery and even missing treatable cancers.

In a new paper presented at the AAAI Conference on Artificial Intelligence, USC computer science researchers propose a novel approach to mitigate bias in machine learning model training, specifically in image generation.

The researchers used a family of algorithms, called "quality-diversity algorithms" or QD algorithms, to create diverse synthetic datasets that can strategically "plug the gaps" in real-world training data.

The paper, titled "Quality-Diversity Generative Sampling for Learning with Synthetic Data," appears on the pre-print server arXiv and was lead-authored by Allen Chang, a senior double majoring in computer science and applied math.

"I think it is our responsibility as computer scientists to better protect all communities, including minority or less frequent groups, in the systems we design," said Chang. "We hope that quality-diversity optimization can help to generate fair synthetic data for broad impacts in medical applications and other types of AI systems."

Increasing fairness

While generative AI models have been used to create synthetic data in the past, "there's a danger of producing biased data, which can further bias downstream models, creating a vicious cycle," said Chang.

Quality diversity algorithms, on the other hand, are typically used to generate diverse solutions to a problem, for instance, helping robots explore unknown environments, or generating game levels in a video game. In this case, the algorithms were put to work in a new way: to solve the problem of creating diverse synthetic datasets.

Using this method, the team was able to generate a diverse dataset of around 50,000 images in 17 hours, around 20 times more efficiently than traditional methods of "rejection sampling," said Chang. The team tested the dataset on up to four measures of diversity—skin tone, gender presentation, age, and hair length.

"We found that training data produced with our method has the potential to increase fairness in the machine learning model, increasing accuracy on faces with darker skin tones, while maintaining accuracy from training on additional data," said Chang.

"This is a promising direction for augmenting models with bias-aware sampling, which we hope can help AI systems perform accurately for all users."

Notably, the method increases the representation of intersectional groups—a term for groups with multiple identities—in the data. For instance, people who have both dark skin tones and wear eyeglasses, which would be especially limited traits in traditional real-world datasets.

"While there has been previous work on leveraging QD algorithms to generate diverse content, we show for the first time that generative models can use QD to repair biased classifiers," said Nikolaidis.

"They do this by iteratively generating and rebalancing content across user-specified features, using the newly balanced content to improve classifier fairness. This work is a first step in the direction of enabling biased models to 'self-repair'' by iteratively generating and retraining on synthetic data."

More information: Allen Chang et al, Quality-Diversity Generative Sampling for Learning with Synthetic Data, arXiv (2023). DOI: 10.48550/arxiv.2312.14369

Provided by University of Southern California

Citation: Diversifying data to beat bias (2024, February 23) retrieved 29 June 2024 from https://techxplore.com/news/2024-02-diversifying-bias.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Team develops a new deepfake detector designed to be less biased

5 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Jun 28, 2024

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

Diversifying data to beat bias

Increasing fairness

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Team develops a new deepfake detector designed to be less biased

Synthetic imagery sets new bar in AI training efficiency

Training physicians and algorithms in dermatology diversity

When it comes to AI, can we ditch the datasets?

Images of simulated cities help artificial intelligence to understand real streetscapes

Addressing copyright, compensation issues in generative AI

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Phys.org

Medical Xpress

Science X

Diversifying data to beat bias

Increasing fairness

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Team develops a new deepfake detector designed to be less biased

Synthetic imagery sets new bar in AI training efficiency

Training physicians and algorithms in dermatology diversity

When it comes to AI, can we ditch the datasets?

Images of simulated cities help artificial intelligence to understand real streetscapes

Addressing copyright, compensation issues in generative AI

Recommended for you

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Your Privacy