January 3, 2020

Researchers develop new open-source system to manage and share complex datasets

by Laura Arenschield, The Ohio State University

Simplifying how scientists share data — Researchers have created a new open-source data-management system for scientists, with the hope that the system might make collaboration easier. Credit: Markus Spiske on Unsplash

Data is often at the heart of science—researchers track velocities, measure light coming from stars, analyze heart rates and cholesterol levels and scan the human brain for electrical impulses.

But often, sharing that data with other scientists—or with peer-reviewed journal editors, or funders—is difficult. The software might be proprietary, and prohibitively expensive to purchase. It might take years of training for a person to be able to manage and understand the software. Or the company that created the software might have gone out of business.

A research team has developed an open-source data-management system that the scientists hope will solve all of those problems. The researchers outlined their system today in the journal PLOS ONE.

"We wanted to create a file format and a dataset model that would encapsulate the majority of datasets we work on, on all the instruments in a lab," said Philip Grandinetti, professor of chemistry at The Ohio State University and senior author of the paper. "There's this long-standing problem, pervasive among scientists, that you buy a multimillion-dollar instrument and the companies that make that instrument have their own proprietary format, and it's a nightmare to share with anyone else."

Large datasets are tricky to share, in part because software is often proprietary, but also in part because the files are often so large that they are hard to share in an email or through a cloud-based server. And even if the files can be exported as a file type that can be shared, important metadata—the things that explain what the dataset actually is—are often lost.

Their system, which Grandinetti and colleagues named the "Core Scientific Data Model," is designed to share complex datasets easily, without massive files that take up a lot of bandwidth and hard drive space, and without losing metadata. Consider a dataset that includes air temperature, air pressure, wind velocity and solar flux—this system can handle it. Or consider the measurements and color of a light coming from a star in a distant galaxy—this system can handle it.

"You need a dataset that is incredibly flexible in its ability to hold all those things in one file format without losing information," Grandinetti said. "So the idea is we created a model that we thought was flexible enough to do that."

The Ohio State University team, in collaboration with Professor Thomas Vosegaard at the University of Aarhus in Denmark, and Dr. Dominique Massiot at the University of Orléans in France, built software that can run on a Mac or PC. They uploaded it to the web and made the code open-source (meaning anyone can look at it, use it, and download it for free.) The publication in PLOS ONE is intentional: The journal is also available to anyone, free of charge.

And, the researchers hope, the system could be a simple, free way to combine multiple types of data into one place.

"We study multiple datasets as scientists—and as a scientist myself, I'd like to be able to get the data from all those files and put them together in a way that I can work with," said Deepansh Srivastava, a postdoctoral researcher in Grandinetti's group.

"Instead of looking for data and plucking it from datasets, if we could simply export it as this one file type—as a core scientific data file type—we'd be able to work in a common system."

More information: Deepansh J. Srivastava et al. Core Scientific Dataset Model: A lightweight and portable model and file format for multi-dimensional scientific data, PLOS ONE (2020). DOI: 10.1371/journal.pone.0225953

Journal information: PLoS ONE

Provided by The Ohio State University

Citation: Researchers develop new open-source system to manage and share complex datasets (2020, January 3) retrieved 29 June 2024 from https://techxplore.com/news/2020-01-open-source-complex-datasets.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Free dataset archive helps researchers quickly find a needle in a haystack

238 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Jun 28, 2024

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

Researchers develop new open-source system to manage and share complex datasets

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Free dataset archive helps researchers quickly find a needle in a haystack

Google AI research scientist announces Dataset Search

New software aims to make science more replicable

Two major security vulnerabilities found in PDF files

Handling trillions of supercomputer files just got simpler

GA4GH streaming API htsget a bridge to the future for modern genomic data processing

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Researchers propose the next platform for brain-inspired computing

Phys.org

Medical Xpress

Science X

Researchers develop new open-source system to manage and share complex datasets

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Free dataset archive helps researchers quickly find a needle in a haystack

Google AI research scientist announces Dataset Search

New software aims to make science more replicable

Two major security vulnerabilities found in PDF files

Handling trillions of supercomputer files just got simpler

GA4GH streaming API htsget a bridge to the future for modern genomic data processing

Recommended for you

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Researchers propose the next platform for brain-inspired computing

Your Privacy