June 9, 2020

Data management system developed to bridge the gap between databases and data science

Relational databases are used to store information or data in such a way that it preserves relations between the data. This property makes it a useful tool for data scientists. There is, however, a gap between the relational database research community and data scientists. This leads to inefficient use of databases in data science. Ph.D. student Mark Raasveldt tried to bridge the gap between the relational databases and data science. Ph.D. defense 9 June 2020.

Integration with analytical tools

Most data scientists use analytical tools, such as R, Python and C/C++, for their research. These tools are difficult to integrate with current database systems, resulting in slow and cumbersome data analysis. "Data scientists have opted to reinvent database systems by developing a zoo of data management alternatives that perform similar tasks to classical database management systems, but have many of the problems that were solved in the database field decades ago," says Raasveldt.

"The database research community has made tremendous strides in developing powerful database engines that allow for efficient analytical query processing." Raasveldt tried to combine these innovations in the database science with the analytical tools that are mostly used by data scientists. "We investigate how we can facilitate efficient and painless integration of analytical tools and relational database management systems," says Raasveldt.

Large datasets

Another issue with the use of standard database systems in computer science is the size of the data that is handled. Most database systems are not optimized for large data sets and large-scale data analysis using remote servers. To optimize the database systems, there are three methods that can be considered.

"We focus our investigation on the three primary methods for database-client integration: client-server connections, in-database processing and embedding the database inside the client application," Raasveldt explains. For every method, he studied the implementations in existing database systems and he evaluated how efficient they are for the large datasets and workloads that are common in data science.

DuckDB

Raasveldts final result was a new data management system, called DuckDB, that was purpose-built for efficient and painless integration with R and Python (and other analytical tools). This management system is meant to be used as a mature database system that is not only used for research purposes.

"In DuckDB, we take all the lessons that we have learned investigating database-client integrations and create an easy-to-use and highly efficient embedded database." Raasveldt will continue his work as a postdoc at the CWI, where he will work on further developing DuckDB.

More information: DuckDB: www.duckdb.org

Provided by Leiden University

Citation: Data management system developed to bridge the gap between databases and data science (2020, June 9) retrieved 17 July 2024 from https://techxplore.com/news/2020-06-bridge-gap-databases-science.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Building better coronavirus databases with automatic quality checks

14 shares

Feedback to editors

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

8 minutes ago

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

15 hours ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

17 hours ago

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

19 hours ago

Large language models make human-like reasoning mistakes, researchers find

20 hours ago

Unveiling a new class of synthetic fuels

20 hours ago

Microsoft unveils software that allows LLMs to work with spreadsheets

20 hours ago

New technique to assess a general-purpose AI model's reliability before it's deployed

21 hours ago

New system enables intuitive teleoperation of a robotic manipulator in real-time

Jul 16, 2024

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

Jul 16, 2024

Load comments (0)

Data management system developed to bridge the gap between databases and data science

Integration with analytical tools

Large datasets

DuckDB

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

Building better coronavirus databases with automatic quality checks

New database reveals plants' secret relationships with fungi

Pofatu: A new database for geochemical 'fingerprints' of artefacts

Accelerating life science and health discoveries: Turning data into insights

System designed to improve database performance for health care, IoT

System designed to improve database performance for healthcare, IoT

Microsoft unveils software that allows LLMs to work with spreadsheets

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Visual abilities of language models found to be lacking depth

Reasoning skills of large language models are often overestimated, researchers find

A new model to plan and control the movements of humanoids in 3D environments

Researchers introduce generative AI to analyze complex tabular data

Phys.org

Medical Xpress

Science X

Data management system developed to bridge the gap between databases and data science

Integration with analytical tools

Large datasets

DuckDB

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

Related Stories

Building better coronavirus databases with automatic quality checks

New database reveals plants' secret relationships with fungi

Pofatu: A new database for geochemical 'fingerprints' of artefacts

Accelerating life science and health discoveries: Turning data into insights

System designed to improve database performance for health care, IoT

System designed to improve database performance for healthcare, IoT

Recommended for you

Microsoft unveils software that allows LLMs to work with spreadsheets

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Visual abilities of language models found to be lacking depth

Reasoning skills of large language models are often overestimated, researchers find

A new model to plan and control the movements of humanoids in 3D environments

Researchers introduce generative AI to analyze complex tabular data

Your Privacy