November 16, 2021

Big data privacy for machine learning just got 100 times cheaper

Rice University computer scientists have discovered an inexpensive way for tech companies to implement a rigorous form of personal data privacy when using or sharing large databases for machine learning.

"There are many cases where machine learning could benefit society if data privacy could be ensured," said Anshumali Shrivastava, an associate professor of computer science at Rice. "There's huge potential for improving medical treatments or finding patterns of discrimination, for example, if we could train machine learning systems to search for patterns in large databases of medical or financial records. Today, that's essentially impossible because data privacy methods do not scale."

Shrivastava and Rice graduate student Ben Coleman hope to change that with a new method they'll present this week at CCS 2021, the Association for Computing Machinery's annual flagship conference on computer and communications security. Using a technique called locality sensitive hashing, Shirvastava and Coleman found they could create a small summary of an enormous database of sensitive records. Dubbed RACE, their method draws its name from these summaries, or "repeated array of count estimators" sketches.

Coleman said RACE sketches are both safe to make publicly available and useful for algorithms that use kernel sums, one of the basic building blocks of machine learning, and for machine-learning programs that perform common tasks like classification, ranking and regression analysis. He said RACE could allow companies to both reap the benefits of large-scale, distributed machine learning and uphold a rigorous form of data privacy called differential privacy.

Differential privacy, which is used by more than one tech giant, is based on the idea of adding random noise to obscure individual information.

"There are elegant and powerful techniques to meet differential privacy standards today, but none of them scale," Coleman said. "The computational overhead and the memory requirements grow exponentially as data becomes more dimensional."

Data is increasingly high-dimensional, meaning it contains both many observations and many individual features about each observation.

RACE sketching scales for high-dimensional data, he said. The sketches are small and the computational and memory requirements for constructing them are also easy to distribute.

"Engineers today must either sacrifice their budget or the privacy of their users if they wish to use kernel sums," Shrivastava said. "RACE changes the economics of releasing high-dimensional information with differential privacy. It's simple, fast and 100 times less expensive to run than existing methods."

This is the latest innovation from Shrivasta and his students, who have developed numerous algorithmic strategies to make machine learning and data science faster and more scalable. They and their collaborators have: found a more efficient way for social media companies to keep misinformation from spreading online, discovered how to train large-scale deep learning systems up to 10 times faster for "extreme classification" problems, found a way to more accurately and efficiently estimate the number of identified victims killed in the Syrian civil war, showed it's possible to train deep neural networks as much as 15 times faster on general purpose CPUs (central processing units) than GPUs (graphics processing units), and slashed the amount of time required for searching large metagenomic databases.

More information: Benjamin Coleman et al, A One-Pass Private Sketch for Most Machine Learning Tasks, arXiv:2006.09352 [cs.DS], arxiv.org/abs/2006.09352

Provided by Rice University

Citation: Big data privacy for machine learning just got 100 times cheaper (2021, November 16) retrieved 5 July 2024 from https://techxplore.com/news/2021-11-big-privacy-machine-cheaper.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

A model to classify financial texts while protecting users' privacy

196 shares

Feedback to editors

New contaminant-tolerant catalyst could help capture carbon directly from smokestacks

12 hours ago

Is AI a major drain on the world's energy supply?

12 hours ago

Adding audio data when training robots helps them do a better job

13 hours ago

New electrolyte design boosts lithium metal battery range while minimizing fluorine content

13 hours ago

A new brain-inspired artificial dendritic neural circuit

14 hours ago

Student designs wearable purifier to protect underground train users and improve air quality

Jul 4, 2024

Cool roofs outperform green roofs in urban climate modeling study

Jul 4, 2024

Japan deploys humanoid robot for railway maintenance

Jul 4, 2024

Think you're funny? ChatGPT might be funnier

Jul 3, 2024

'Open-washing' generative AI: How Meta, Google and others feign openness

Jul 3, 2024

Load comments (0)

Big data privacy for machine learning just got 100 times cheaper

New contaminant-tolerant catalyst could help capture carbon directly from smokestacks

Is AI a major drain on the world's energy supply?

Adding audio data when training robots helps them do a better job

New electrolyte design boosts lithium metal battery range while minimizing fluorine content

A new brain-inspired artificial dendritic neural circuit

Student designs wearable purifier to protect underground train users and improve air quality

Cool roofs outperform green roofs in urban climate modeling study

Japan deploys humanoid robot for railway maintenance

Think you're funny? ChatGPT might be funnier

'Open-washing' generative AI: How Meta, Google and others feign openness

A model to classify financial texts while protecting users' privacy

CPU algorithm trains deep neural nets up to 15 times faster than top GPU trainers

New AI technology protects privacy in healthcare settings

Deep learning rethink overcomes major obstacle in AI industry

Bad news for fake news: New research helps combat social media misinformation

Study demonstrates the quantum speed up of supervised machine learning on a new classification task

A new brain-inspired artificial dendritic neural circuit

Adding audio data when training robots helps them do a better job

Is AI a major drain on the world's energy supply?

Think you're funny? ChatGPT might be funnier

Meta releases four new publicly available AI models for developer use

'Open-washing' generative AI: How Meta, Google and others feign openness

Phys.org

Medical Xpress

Science X

Big data privacy for machine learning just got 100 times cheaper

New contaminant-tolerant catalyst could help capture carbon directly from smokestacks

Is AI a major drain on the world's energy supply?

Adding audio data when training robots helps them do a better job

New electrolyte design boosts lithium metal battery range while minimizing fluorine content

A new brain-inspired artificial dendritic neural circuit

Student designs wearable purifier to protect underground train users and improve air quality

Cool roofs outperform green roofs in urban climate modeling study

Japan deploys humanoid robot for railway maintenance

Think you're funny? ChatGPT might be funnier

'Open-washing' generative AI: How Meta, Google and others feign openness

Related Stories

A model to classify financial texts while protecting users' privacy

CPU algorithm trains deep neural nets up to 15 times faster than top GPU trainers

New AI technology protects privacy in healthcare settings

Deep learning rethink overcomes major obstacle in AI industry

Bad news for fake news: New research helps combat social media misinformation

Study demonstrates the quantum speed up of supervised machine learning on a new classification task

Recommended for you

A new brain-inspired artificial dendritic neural circuit

Adding audio data when training robots helps them do a better job

Is AI a major drain on the world's energy supply?

Think you're funny? ChatGPT might be funnier

Meta releases four new publicly available AI models for developer use

'Open-washing' generative AI: How Meta, Google and others feign openness

Your Privacy