October 29, 2020

New tool simplifies data sharing, preserves privacy

by Daniel Tkacik, Carnegie Mellon University

Meet Company X. Company X makes a popular product that lots of people—millions, in fact—use on a daily basis. One day, Company X decides it would like to improve some of the hardware in its product, which is manufactured by Vendor Y. To make these improvements, the company would need to share data with Vendor Y about how its customers use the product.

Unfortunately, that data may contain personal information about Company X's customers, so sharing it would be an invasion of their privacy. Company X doesn't want to do that, so they abandon the improvement opportunity.

According to a new study authored by researchers in Carnegie Mellon University's CyLab and IBM, a new tool can help circumvent this privacy issue in data sharing. Companies, organizations, and governments alike have to deal with this issue in today's world of Big Data. The study is being presented at this week's ACM Internet Measurement Conference, where it has been named a finalist in the conference's Best Paper Award.

One approach that has been used to avoid breaching privacy is to synthesize new data that mimic the original dataset while leaving the sensitive information out. This, however, is easier said than done.

The team of researchers created a new tool—dubbed "DoppelGANger"—that utilizes generative adversarial networks, or GANs, which employ machine learning techniques to synthesize datasets that have the same statistics as the original "training" data.

On the datasets they evaluated, models trained with DoppelGANger-produced synthetic data had up to 43 percent higher accuracy than models trained synthetic data from competing tools.

Most tools today require expertise in complex mathematical modeling, which creates a barrier for data sharing across different levels of expertise. However, DoppelGANger requires little to no prior knowledge of the dataset and its configurations due to the fact that GANs themselves are able to generalize across different datasets and use cases. This makes the tool highly flexible, the researchers say, and that flexibility is key to data sharing in cybersecurity situations.

"We believe that future organizations will need to flexibly utilize all available data to be able to react to an increasingly data-driven and automated attack landscape," says CyLab's Vyas Sekar, a professor in ECE and Lin's co-advisor. "In that sense, any tools that facilitate data sharing are going to be essential."

CyLab's Giulia Fanti, a professor in ECE and Lin's Ph.D. co-advisor, also sees the tool as being beneficial to security engineers.

"Synthetic network data can be used to help create realistic training testbeds for network security engineers without exposing real, sensitive data," says Fanti.

The team's next step is to expand to tool's capabilities, because despite its remarkable performance, it's limited to relatively simple datasets.

"Many networking datasets require significantly more complexity than DoppelGANger is currently able to handle," Lin says.

For those interested in using the tool, DoppelGANger is open-sourced on Github. The research was sponsored in part by the National Science Foundation and the Army Research Laboratory.

More information: Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions, arXiv:1909.13403 [cs.LG] arxiv.org/abs/1909.13403

Provided by Carnegie Mellon University

Citation: New tool simplifies data sharing, preserves privacy (2020, October 29) retrieved 16 August 2024 from https://techxplore.com/news/2020-10-tool-privacy.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

The real promise of synthetic data

23 shares

Feedback to editors

Engineers design tiny batteries for powering cell-sized robots

10 hours ago

Leaf-like solar concentrators promise major boost in solar efficiency

11 hours ago

Why does AI beat humans at the strategy game Diplomacy?

12 hours ago

New technique prints metal oxide thin film circuits at room temperature

13 hours ago

Studies highlight challenges and solutions in making large language models trustworthy

13 hours ago

Finding security flaws in Android ahead of malicious hackers

14 hours ago

Robot planning tool accounts for human carelessness

14 hours ago

From shrimp to steel: Introducing nature-inspired metalworking

15 hours ago

'AI Scientist' model designed to conduct scientific research autonomously

16 hours ago

Global AI adoption is outpacing risk understanding, researchers warn

16 hours ago

Load comments (0)

New tool simplifies data sharing, preserves privacy

Engineers design tiny batteries for powering cell-sized robots

Leaf-like solar concentrators promise major boost in solar efficiency

Why does AI beat humans at the strategy game Diplomacy?

New technique prints metal oxide thin film circuits at room temperature

Studies highlight challenges and solutions in making large language models trustworthy

Finding security flaws in Android ahead of malicious hackers

Robot planning tool accounts for human carelessness

From shrimp to steel: Introducing nature-inspired metalworking

'AI Scientist' model designed to conduct scientific research autonomously

Global AI adoption is outpacing risk understanding, researchers warn

The real promise of synthetic data

New method enables automated protections for sensitive data

Generating realistic stock market data for deeper financial research

How to fake a medical record in order to mitigate privacy risks

Altered data sets can still provide statistical integrity and preserve privacy

Training artificial intelligence with artificial X-rays

Robot planning tool accounts for human carelessness

'AI Scientist' model designed to conduct scientific research autonomously

Finding security flaws in Android ahead of malicious hackers

New study reveals loophole in digital wallet security—even if rightful cardholder doesn't use a digital wallet

Cybersecurity flaws could derail high-profile cycling races

New chatbot can spot cyberattacks before they start

Phys.org

Medical Xpress

Science X

New tool simplifies data sharing, preserves privacy

Engineers design tiny batteries for powering cell-sized robots

Leaf-like solar concentrators promise major boost in solar efficiency

Why does AI beat humans at the strategy game Diplomacy?

New technique prints metal oxide thin film circuits at room temperature

Studies highlight challenges and solutions in making large language models trustworthy

Finding security flaws in Android ahead of malicious hackers

Robot planning tool accounts for human carelessness

From shrimp to steel: Introducing nature-inspired metalworking

'AI Scientist' model designed to conduct scientific research autonomously

Global AI adoption is outpacing risk understanding, researchers warn

Related Stories

The real promise of synthetic data

New method enables automated protections for sensitive data

Generating realistic stock market data for deeper financial research

How to fake a medical record in order to mitigate privacy risks

Altered data sets can still provide statistical integrity and preserve privacy

Training artificial intelligence with artificial X-rays

Recommended for you

Robot planning tool accounts for human carelessness

'AI Scientist' model designed to conduct scientific research autonomously

Finding security flaws in Android ahead of malicious hackers

New study reveals loophole in digital wallet security—even if rightful cardholder doesn't use a digital wallet

Cybersecurity flaws could derail high-profile cycling races

New chatbot can spot cyberattacks before they start

Your Privacy