May 1, 2024

Researchers conduct survey on deduplication systems

by David Bradley, Inderscience

data servers — Credit: Pixabay/CC0 Public Domain

A review published in the International Journal of Grid and Utility Computing has investigated ways in which the increasing problem of duplicate data in computer storage systems might be addressed. Solutions to this problem could improve storage efficiency, system performance, and reduce the overall demand on resources.

Amdewar Godavari and Chapram Sudhakar of the department of Computer Science and Engineering at the National Institute of Technology Warangal in Warangal, Telangana, India explain how the advent of the Internet of Things (IoT) and the emergence of big data in science, engineering, medical, and many other areas has led to a massive increase in computer storage demand.

Some researchers have suggested that by 2025, the amount of stored data will amount to around 175 zettabytes (175 trillion terabytes). Other research has provided estimates of duplication in this data and suggests that around three-quarters, 75%, is wholly redundant. This redundancy leads to inefficient storage utilization and decreased performance in storage systems. Identifying the duplicate content that might be removed from a system is not a simple matter.

To address this challenge, the researchers point out that there are two general approaches. The first is data compression, which will compare files and crush file sizes based in the identification of duplicates. Full-on data deduplication, however, can compute a unique "hash value" for much larger blocks of data, compares those hashes to find blocks containing identical data and so flag them for removal as appropriate. This latter approach could be used to reduce the amount of down-time or latency that would otherwise impinge on performance and access.

The team suggests that various chunking algorithms and machine learning-based techniques might be used to identify redundant blocks of data. Their tests show that variable-sized chunking offers better deduplication ratios compared to fixed-sized chunking, although this approach is slower. The algorithmic approach, however, could allow redundancy categorization to use machine learning to improve efficiency still further.

More information: Amdewar Godavari et al, A survey on deduplication systems, International Journal of Grid and Utility Computing (2024). DOI: 10.1504/IJGUC.2024.137902

Provided by Inderscience

Citation: Researchers conduct survey on deduplication systems (2024, May 1) retrieved 17 July 2024 from https://techxplore.com/news/2024-05-survey-deduplication.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Dynamic-EC: An efficient dynamic erasure coding method for permissioned blockchain systems

6 shares

Feedback to editors

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

11 hours ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

13 hours ago

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

15 hours ago

Large language models make human-like reasoning mistakes, researchers find

15 hours ago

Unveiling a new class of synthetic fuels

16 hours ago

Microsoft unveils software that allows LLMs to work with spreadsheets

16 hours ago

New technique to assess a general-purpose AI model's reliability before it's deployed

17 hours ago

New system enables intuitive teleoperation of a robotic manipulator in real-time

19 hours ago

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

21 hours ago

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Jul 15, 2024

Load comments (0)

Researchers conduct survey on deduplication systems

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Dynamic-EC: An efficient dynamic erasure coding method for permissioned blockchain systems

Novel data storage system ready to offer relief for frustrated users, companies

A novel data-compression technique for faster computer programs

Offloading functionalities to the storage device for greater speeds

Dynamic BLOB adjustment gives cloud computing a 96% efficiency boost

New techniques efficiently accelerate sparse tensors for massive AI models

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Visual abilities of language models found to be lacking depth

Reasoning skills of large language models are often overestimated, researchers find

A new model to plan and control the movements of humanoids in 3D environments

Researchers introduce generative AI to analyze complex tabular data

Computer scientists develop new and improved camera inspired by the human eye

Phys.org

Medical Xpress

Science X

Researchers conduct survey on deduplication systems

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Related Stories

Dynamic-EC: An efficient dynamic erasure coding method for permissioned blockchain systems

Novel data storage system ready to offer relief for frustrated users, companies

A novel data-compression technique for faster computer programs

Offloading functionalities to the storage device for greater speeds

Dynamic BLOB adjustment gives cloud computing a 96% efficiency boost

New techniques efficiently accelerate sparse tensors for massive AI models

Recommended for you

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Visual abilities of language models found to be lacking depth

Reasoning skills of large language models are often overestimated, researchers find

A new model to plan and control the movements of humanoids in 3D environments

Researchers introduce generative AI to analyze complex tabular data

Computer scientists develop new and improved camera inspired by the human eye

Your Privacy