February 17, 2019

Altered data sets can still provide statistical integrity and preserve privacy

Synthetic networks may increase the availability of some data while still protecting individual or institutional privacy, according to a Penn State statistician.

"My key interest is in developing methodology that would enable broader sharing of confidential data in a way that can aid in scientific discovery," said Aleksandra Slavkovic, professor of statistics and associate dean for graduate education, Eberly College of Science, Penn State. "Being able to share confidential data with minimal quantifiable risk for discovery of sensitive information and still ensure statistical accuracy and integrity, is the goal."

Slavkovic has found solutions to this data privacy problem through interdisciplinary collaborations, especially with computer and social scientists. Her research focuses on various data, including network data that capture relationship information between entities such as individuals or institutions. She reported her approaches to providing synthetic networks that satisfy a notion of differential privacy today (Feb 16) during the 2019 annual meeting of the American Association for the Advancement of Science in Washington, D.C.

Differential privacy provides a mathematically provable guarantee of the level of privacy loss to individuals.

Scientists want access to data collected by others for their research, but such access could also compromise personal privacy, even after removal of so-called personally identifiable data.

"An abundance of auxiliary data is the main culprit," said Slavkovic. "With methodological and technological advances in data collection and record linkage, easier access to variety of data sources that could be linked with a dataset in hand, and funding agencies requirements to share data, the risks to data privacy are increasing. But, finding good solutions for managing privacy loss are essential for enabling sound scientific discovery."

Publicly available information from a drug trial on an HIV drug, for example, would indicate who was in the treatment group and who was in the control group. The treatment group would contain only people diagnosed with HIV and even though the data owners withheld personal particulars from that data set, some identifying information would remain. Because so much information is today available online in social media and in other datasets, it is possible to connect the dots and identify people, potentially revealing their HIV status.

"Techniques to link two data sets, say voter records and health insurance data, have greatly improved," said Slavkovic. "In one of the earliest findings, Latanya Sweeny (now at Harvard) showed that by linking these type of data, you can identify 87 percent of the people in the U.S. Census from 1990 based on their date of birth, gender and 5-digit zip code. More recently, researchers used tweets and associated Twitter metadata to show that they can identify users with 96.7 percent accuracy."

Slavkovic notes that it is not just people or institutions whose data are contained in the databases, but that people outside the database can also suffer from invasion of privacy, directly or by association. Linkages between information in a dataset and information on social media might lead to a serious privacy breech—something like HIV status or sexual orientation could have severe repercussions if revealed.

While privacy is important, collected datasets make up an essential source of information for researchers. Currently, in some cases when the data are exceptionally sensitive, researchers must physically go to the data repositories to do their research, making research more difficult and expensive.

Slavkovic is interested in network data. Information that shows the interconnectedness of people or institutions—the nodes—and the connections between nodes. Her approach is to create slightly altered, mirrored network datasets with a few of the nodes moved, connections shifted or edges altered.

"The aim is to create new networks that satisfy the rigorous differential privacy requirements and at the same time capture most of the statistical features from the original network," said Slavkovic.

These synthetic datasets might be sufficient for some researchers to satisfy their research needs. For others, it would be sufficient to test their approaches and hypothesis before having to go to the data storage site. Researchers could test code, do exploratory research and perhaps basic analysis while waiting for permission to use the original data in its repository site.

"We can't satisfy demands for all statistical analysis with the same type of altered data," said Slavkovic. "Some people will need the original data, but others might go a long way with synthetic data such as synthetic networks."

Provided by Pennsylvania State University

Citation: Altered data sets can still provide statistical integrity and preserve privacy (2019, February 17) retrieved 23 April 2024 from https://techxplore.com/news/2019-02-statistical-privacy.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Local official sues Facebook over data misuse

Feedback to editors

Researchers develop high-energy-density aqueous battery based on halogen multi-electron transfer

5 minutes ago

Extracting high-purity gold from electrical and electronic waste

1 hour ago

How potatoes, corn and beans led to breakthrough in smart windows technology

1 hour ago

A new framework to generate human motions from language prompts

5 hours ago

New metasurface innovation unlocks precision control in wireless signals

19 hours ago

Neural networks can mediate between download size and quality, according to researcher

20 hours ago

A win-win approach: Maximizing Wi-Fi performance using game theory

20 hours ago

Plasma treatment enhances electrode material for fuel cells in industry, homes and vehicles

Apr 22, 2024

People, not design features, make a robot social

Apr 22, 2024

An ultralow-concentration electrolyte for lithium-ion batteries

Apr 22, 2024

Load comments (0)

Altered data sets can still provide statistical integrity and preserve privacy

Researchers develop high-energy-density aqueous battery based on halogen multi-electron transfer

Extracting high-purity gold from electrical and electronic waste

How potatoes, corn and beans led to breakthrough in smart windows technology

A new framework to generate human motions from language prompts

New metasurface innovation unlocks precision control in wireless signals

Neural networks can mediate between download size and quality, according to researcher

A win-win approach: Maximizing Wi-Fi performance using game theory

Plasma treatment enhances electrode material for fuel cells in industry, homes and vehicles

People, not design features, make a robot social

An ultralow-concentration electrolyte for lithium-ion batteries

Local official sues Facebook over data misuse

Review reveals ambiguous understanding of genetic privacy in US study participants

Cambridge Analytica is 'only the beginning,' says data expert

US senators introduce social media privacy bill

Facebook rolls out 'privacy checkup'

Social network analysis privacy tackled

A new framework to generate human motions from language prompts

Neural networks can mediate between download size and quality, according to researcher

A coffee roastery in Finland has launched an AI-generated blend. The results were surprising

Microsoft teases lifelike avatar AI tech but gives no release date

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Phys.org

Medical Xpress

Science X

Altered data sets can still provide statistical integrity and preserve privacy

Researchers develop high-energy-density aqueous battery based on halogen multi-electron transfer

Extracting high-purity gold from electrical and electronic waste

How potatoes, corn and beans led to breakthrough in smart windows technology

A new framework to generate human motions from language prompts

New metasurface innovation unlocks precision control in wireless signals

Neural networks can mediate between download size and quality, according to researcher

A win-win approach: Maximizing Wi-Fi performance using game theory

Plasma treatment enhances electrode material for fuel cells in industry, homes and vehicles

People, not design features, make a robot social

An ultralow-concentration electrolyte for lithium-ion batteries

Related Stories

Local official sues Facebook over data misuse

Review reveals ambiguous understanding of genetic privacy in US study participants

Cambridge Analytica is 'only the beginning,' says data expert

US senators introduce social media privacy bill

Facebook rolls out 'privacy checkup'

Social network analysis privacy tackled

Recommended for you

A new framework to generate human motions from language prompts

Neural networks can mediate between download size and quality, according to researcher

A coffee roastery in Finland has launched an AI-generated blend. The results were surprising

Microsoft teases lifelike avatar AI tech but gives no release date

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Your Privacy