October 13, 2021 feature

A model to classify financial texts while protecting users' privacy

by Ingrid Fadelli , Tech Xplore

Over the past decade or so, computer scientists developed a variety of machine learning (ML) models that can analyze large amounts of data both quickly and efficiently. To be applied in real-world situations that involve the analysis of highly sensitive data, however, these models should protect the privacy of users and prevent information from reaching third parties or from being accessed by developers.

Researchers at Manipal Institute of Technology, Carnagie Mellon University and Yildiz Technical University have recently created a privacy-enabled model for the analysis and classification of financial texts. This model, introduced in a paper pre-published on arXiv, is based on a combination of natural language processing (NLP) and machine learning techniques.

"Our paper was based on our previous work, named 'Benchmarking differential privacy and federated learning for BERT models'," Priyam Basu, one of the researchers who carried out the study, told Tech Xplore. "This work was our modest attempt at combining the domains of natural language processing (NLP) and privacy preserving machine learning."

The main objective of the recent work by Basu and his colleagues was to develop a NLP model that preserves the privacy of users, preventing their data from being accessed by others. Such a model could be particularly useful for the analysis of bank statements, tax returns and other sensitive financial documents.

"Machine Learning is majorly based on data and gives you insights and predictions and information based on data," Basu said. "Hence, it is very important for us to delve into research on how to preserve user privacy at the same time."

The framework developed by Basu and his colleagues is based on two approaches known as differential privacy and federated learning, combined with bidirectional encoder representations from transformers (BERT), which are renowned and widely used NLP models. Differential privacy techniques add a certain amount of noise to the data that is fed to the model. As a result, the party processing the data (e.g., developers, tech firms or other companies) cannot gain access to the real documents and data, as individual elements are concealed.

"Federated Learning, on the other hand, is a method of training a model on multiple decentralized devices so that no one device has access to the entire data at once," Basu explained. "BERT is a language model that gives contextualized embeddings for natural language text which can be used later on multiple tasks, such as classification, sequence tagging, semantic analysis etc."

Basu and his colleagues used the strategy they developed to train several NLP models for classifying financial texts. They then evaluated these models in a series of experiments, where they used them to analyze data from the Financial Phrase Bank dataset. Their results were highly promising, as they found that the NLP models performed as well as other state-of-the-art techniques for the analysis of financial texts, while ensuring greater data protection.

These researchers' study could have important implications for several industries, including both the financial sector and other fields that involve the analysis of sensitive user data. In the future, the new models they developed could help to significantly increase the privacy associated with NLP techniques that analyze personal and financial information.

"Classification and categorisation based on natural language data is used in a lot of domains and hence, we have provided a way to do the same while maintaining the privacy of user data, which is highly important in finance, where the data used is highly sensitive and confidential," Basu said. "We now plan to improve the accuracy achieved by our model, while not having to lose out too much on the privacy trade-off. We also hope to explore other techniques to achieve the same as well as perform other NLP tasks like NER, Semantic analysis and Clustering using DP and FL."

More information: Privacy enabled financial text classification using differential privacy and federated learning. arXiv:2110.01643 [cs.CL]. arxiv.org/abs/2110.01643

Benchmarking differential privacy and federated learning for BERT models. arXiv:2106.13973 [cs.CL]. arxiv.org/abs/2106.13973

Citation: A model to classify financial texts while protecting users' privacy (2021, October 13) retrieved 26 April 2024 from https://techxplore.com/news/2021-10-financial-texts-users-privacy.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Training enormous AI models in health care while protecting data privacy

107 shares

Feedback to editors

Scientists create new atomic clock that is both ultra-precise and sturdy

8 minutes ago

A framework to compare lithium battery testing data and results during operation

2 hours ago

New approach could make reusing captured carbon far cheaper, less energy-intensive

7 hours ago

How much energy can offshore wind farms in the U.S. produce? New study sheds light

18 hours ago

Engineers uncover key to efficient and stable organic solar cells

22 hours ago

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

23 hours ago

Mask-inspired perovskite smart windows enhance weather resistance and energy efficiency

Apr 25, 2024

Researchers increase storage, efficiency and durability of capacitors

Apr 25, 2024

Study explores why human-inspired machines can be perceived as eerie

Apr 25, 2024

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

Apr 24, 2024

Load comments (0)

A model to classify financial texts while protecting users' privacy

Scientists create new atomic clock that is both ultra-precise and sturdy

A framework to compare lithium battery testing data and results during operation

New approach could make reusing captured carbon far cheaper, less energy-intensive

How much energy can offshore wind farms in the U.S. produce? New study sheds light

Engineers uncover key to efficient and stable organic solar cells

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Mask-inspired perovskite smart windows enhance weather resistance and energy efficiency

Researchers increase storage, efficiency and durability of capacitors

Study explores why human-inspired machines can be perceived as eerie

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

Training enormous AI models in health care while protecting data privacy

Can energy-efficient federated learning save the world?

New AI method keeps data private

Google releases TensorFlow privacy testing module

Putting machine learning in your pocket

Researchers build models using machine learning technique to enhance predictions of COVID-19 outcomes

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Study explores why human-inspired machines can be perceived as eerie

Emulating neurodegeneration and aging in artificial intelligence systems

Microsoft claims that small, localized language models can be powerful as well

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

Phys.org

Medical Xpress

Science X

A model to classify financial texts while protecting users' privacy

Scientists create new atomic clock that is both ultra-precise and sturdy

A framework to compare lithium battery testing data and results during operation

New approach could make reusing captured carbon far cheaper, less energy-intensive

How much energy can offshore wind farms in the U.S. produce? New study sheds light

Engineers uncover key to efficient and stable organic solar cells

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Mask-inspired perovskite smart windows enhance weather resistance and energy efficiency

Researchers increase storage, efficiency and durability of capacitors

Study explores why human-inspired machines can be perceived as eerie

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

Related Stories

Training enormous AI models in health care while protecting data privacy

Can energy-efficient federated learning save the world?

New AI method keeps data private

Google releases TensorFlow privacy testing module

Putting machine learning in your pocket

Researchers build models using machine learning technique to enhance predictions of COVID-19 outcomes

Recommended for you

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Study explores why human-inspired machines can be perceived as eerie

Emulating neurodegeneration and aging in artificial intelligence systems

Microsoft claims that small, localized language models can be powerful as well

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

Your Privacy