June 7, 2021

Improved method for generating synthetic data solves major privacy issues in research

A lack of data is a major bottleneck for many kinds of research, and especially for the development of better medical treatments and drugs. This data is extremely sensitive and, understandably, people and companies alike are often unwilling to share their information with others.

Researchers at the Finnish Center for Artificial Intelligence have developed a machine learning-based method that produces synthetic data on the basis of original data sets, making it possible for researchers to share their data with one other. This could solve the ongoing problem of data scarcity in medical research and other fields where information is sensitive.

The generated data preserves privacy, remaining similar enough to the original data to be used for statistical analyses. With the new method, researchers can conduct an infinite number of analyses without compromising the identities of the individuals involved in the original experiment.

"What we do is we tweak the original data sufficiently so that we can mathematically guarantee that no individual can be recognized," explains Samuel Kaski, Aalto University Professor and Director of FCAI, who co-authored the study.

Researchers have produced and used synthetic data before, but the new study solves a major problem with existing methods.

"We might think that just because data is synthetic, it's safe. This has not necessarily been the case, though," explains Kaski.

This is because synthetic data needs to be very similar to the original data set in order to be useful in research. In practice, it has occasionally been possible to identify individuals' identities despite anonymization.

To address this problem, FCAI researchers make use of artificial intelligence, specifically probabilistic modeling. This enables them to use prior knowledge about the original data and the processes that have made it the way it is—without getting too close to the properties of the particular data set used as basis for the synthetic data. Such prior knowledge, for instance, could relate to known gender differences in alcohol-related mortalities, or could involve domain knowledge about how a particular data set has been collected.

Making use of prior knowledge has also made the synthetic data sets more useful for making correct statistical discoveries—even in cases where the original data set is limited in size, which is common in medical research.

"Incorporating prior knowledge means we can use the method with small data sets, for which we have domain knowledge," Kaski says.

The results are published 7 June in the journal Patterns.

More information: Joonas Jälkö et al, Privacy-preserving data sharing via probabilistic modeling, Patterns (2021). DOI: 10.1016/j.patter.2021.100271

Provided by Aalto University

Citation: Improved method for generating synthetic data solves major privacy issues in research (2021, June 7) retrieved 29 June 2024 from https://techxplore.com/news/2021-06-method-synthetic-major-privacy-issues.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Artificial intelligence produces data synthetically to help treat diseases like COVID-19

9 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

23 hours ago

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

Improved method for generating synthetic data solves major privacy issues in research

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Artificial intelligence produces data synthetically to help treat diseases like COVID-19

Synthetic data mimics real health-care data without patient-privacy concerns

New tool simplifies data sharing, preserves privacy

Researchers develop better way to determine safe drug doses for children

Artificial data give the same results as real data—without compromising privacy

New AI method keeps data private

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New work explores optimal circumstances for reaching a common goal with humanoid robots

Phys.org

Medical Xpress

Science X

Improved method for generating synthetic data solves major privacy issues in research

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Artificial intelligence produces data synthetically to help treat diseases like COVID-19

Synthetic data mimics real health-care data without patient-privacy concerns

New tool simplifies data sharing, preserves privacy

Researchers develop better way to determine safe drug doses for children

Artificial data give the same results as real data—without compromising privacy

New AI method keeps data private

Recommended for you

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New work explores optimal circumstances for reaching a common goal with humanoid robots

Your Privacy