January 19, 2022

Sorting out smart data

by David Bradley, Inderscience

Might scoring the contents of scientific papers based on semantics and lexicon allow a representation of textual experimental data from scientific publications to be extracted? That is the question a team from France hope to answer in the International Journal of Intelligent Information and Database Systems.

Martin Lentschat of the University of Montpellier and colleagues there and at the University of Paris-Saclay explain how their approach uses the scientific publication representation (SciPuRe) to describe extracted data through ontological, lexical, and structural features based on the segments in a scientific document. The scientific literature is vast and in many ways readily accessible to experts. However, a substantial amount of the information contained in this enormous space can only be mined, or harvested, for use by those experts, inclusion in meta-analyses or fed into advanced decision-support tools, if it is somehow processed and the data, information, and knowledge extracted into a form that can be used by the available tools.

The team points out that in the biomedical research domain there has been a lot of focus on how knowledge can be extracted automatically from the published literature because of the nature of the often date-rich experimental outputs. However, in other areas, there has been a lack of tools that can home in on useful information without the need to take prior knowledge and expertise into account. Where biomedical research pivots on big data other areas of research require smart data.

Big data needs no assessment, no scoring based on content and context, it can be pulled from a publication and processed because the prior knowledge about what the data mean is intrinsic to the data in a sense. To work with smart data, on the other, hand requires it to be assessed so that irrelevant data in a publication can be discarded, the new work points to how this very process might be automated to allow tools related to those used to handle big data in biomedical research to be used with smart data from other less data-intensive areas of research.

The team's success with the specialist topic discussed suggests that future studies might open up the same approach to other research domains, although whether those are equally as successful will remain to be seen.

"Experiments were carried out on a corpus of fifty English language scientific papers in the food packaging field," the team reports. "They revealed that article segments are an effective criterion for filtering out the majority of the quantitative entity false positives using lexical scores."

More information: Martin Lentschat et al, Towards combined semantic and lexical scores based on a new representation of textual data to extract experimental data from scientific publications, International Journal of Intelligent Information and Database Systems (2022). DOI: 10.1504/IJIIDS.2022.120146

Journal information: International Journal of Intelligent Information and Database Systems

Provided by Inderscience

Citation: Sorting out smart data (2022, January 19) retrieved 27 July 2024 from https://techxplore.com/news/2022-01-smart.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Healthcare statistics based on 'big data' may not always be reliable

Feedback to editors

Generative AI creates personalized storybooks for the future of child language learning

20 hours ago

Study explores win–win potential of grass-powered energy production

20 hours ago

Novel algorithm for discovering anomalies in data outperforms current software

21 hours ago

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

22 hours ago

Experts warn against hype for deriving green hydrogen from direct seawater electrolysis

23 hours ago

New microgrids model takes into account a fair design of decentralized energy systems

23 hours ago

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

23 hours ago

Robot Spot configured to find and stun weeds using a blowtorch

Jul 26, 2024

Magnetic fields help understand light particle splitting for boosting solar cell efficiency

Jul 26, 2024

OpenAI to challenge Google with new search functionality

Jul 25, 2024

Load comments (0)

Sorting out smart data

Generative AI creates personalized storybooks for the future of child language learning

Study explores win–win potential of grass-powered energy production

Novel algorithm for discovering anomalies in data outperforms current software

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

Experts warn against hype for deriving green hydrogen from direct seawater electrolysis

New microgrids model takes into account a fair design of decentralized energy systems

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

Robot Spot configured to find and stun weeds using a blowtorch

Magnetic fields help understand light particle splitting for boosting solar cell efficiency

OpenAI to challenge Google with new search functionality

Healthcare statistics based on 'big data' may not always be reliable

The Wikipedia of perovskite solar cell research

Non-English-language science could help save biodiversity

Biodiversity data import from historical literature assessed in an EMODnet Workshop Report

AI reveals how glucose helps the SARS-CoV-2 virus

A standard for artificial intelligence in biomedicine

Novel algorithm for discovering anomalies in data outperforms current software

Digital twin method can boost wireless network speed and reliability

Study: When allocating scarce resources with AI, randomization can improve fairness

Lightweight neural network enables realistic rendering of woven fabrics in real-time

Multimodal agent can iteratively design experiments to better understand various components of AI systems

AI study reveals dramatic reasoning breakdown in large language models

Phys.org

Medical Xpress

Science X

Sorting out smart data

Generative AI creates personalized storybooks for the future of child language learning

Study explores win–win potential of grass-powered energy production

Novel algorithm for discovering anomalies in data outperforms current software

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

Experts warn against hype for deriving green hydrogen from direct seawater electrolysis

New microgrids model takes into account a fair design of decentralized energy systems

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

Robot Spot configured to find and stun weeds using a blowtorch

Magnetic fields help understand light particle splitting for boosting solar cell efficiency

OpenAI to challenge Google with new search functionality

Related Stories

Healthcare statistics based on 'big data' may not always be reliable

The Wikipedia of perovskite solar cell research

Non-English-language science could help save biodiversity

Biodiversity data import from historical literature assessed in an EMODnet Workshop Report

AI reveals how glucose helps the SARS-CoV-2 virus

A standard for artificial intelligence in biomedicine

Recommended for you

Novel algorithm for discovering anomalies in data outperforms current software

Digital twin method can boost wireless network speed and reliability

Study: When allocating scarce resources with AI, randomization can improve fairness

Lightweight neural network enables realistic rendering of woven fabrics in real-time

Multimodal agent can iteratively design experiments to better understand various components of AI systems

AI study reveals dramatic reasoning breakdown in large language models

Your Privacy