February 26, 2019

New AI approach bridges the 'slim-data gap' that can stymie deep learning approaches

by Tom Rickey, Pacific Northwest National Laboratory

Scientists have developed a deep neural network that sidesteps a problem that has bedeviled efforts to apply artificial intelligence to tackle complex chemistry—a shortage of precisely labeled chemical data. The new method gives scientists an additional tool to apply deep learning to explore drug discovery, new materials for manufacturing, and a swath of other applications.

Predicting chemical properties and reactions among millions upon millions of compounds is one of the most daunting tasks that scientists face. There is no source of complete information from which a deep learning program could draw upon. Usually, such a shortage of a vast amount of clean data is a show-stopper for a deep learning project.

Scientists at the Department of Energy's Pacific Northwest National Laboratory discovered a way around the problem. They created a pre-training system, kind of a fast-track tutorial where they equip the program with some basic information about chemistry, equip it to learn from its experiences, then challenge the program with huge datasets.

The work was presented at KDD2018, the Conference on Knowledge Discovery and Data Mining, in London.

Cats, dogs, and clean data

For deep learning networks, abundant and clear data has long been the key to success. In the cat vs. dog dialogue that peppers discussions of AI systems, researchers recognize the importance of "labeled data—a photo of a cat is marked a cat, a dog is marked a dog, and so on. Having many, many photos of cats and dogs, clearly marked as such, is a good example of the type of data that AI scientists like to have. The photos provide clear data points that a neural network can use to learn from as it begins to differentiate cats from dogs.

Credit: Pacific Northwest National Laboratory

But chemistry is more complex than sorting cats from dogs. Hundreds of factors affect a molecule's promiscuity, and thousands of interactions can happen in a flash of a second. AI researchers in chemistry are often faced with either small but thorough data sets or huge but inconsistent datasets—think 100 clear images of chihuahuas or 10 million images of furry blobs. Neither is ideal or even workable alone.

So the scientists created a way to bridge the gap, combining the best of "slim but good data" with "big but poor data."

The team, led by former PNNL scientist Garrett Goh, employed a technique known as rule-based supervised learning. Scientists point the neural network to a vast repository of chemical data known as ChEMBL, and they generate rule-based labels for each of these many molecules, for example calculating the mass of the molecule. The neural network crunches through the raw data, learning principles of chemistry that relate the molecule to basic chemical fingerprints. Taking the neural network trained on the rule-based data, the scientists presented it with the small, but high quality, dataset containing the final properties to be predicted.

The pre-training paid off. The program, called ChemNet, achieved a level of knowledge and precision as accurate or more than the current best deep learning models available when analyzing molecules for their toxicity, their level of biochemical activity related to HIV, and their level of a chemical process known as solvation. The program did so with much less labeled data than its counterparts and achieved the results with less computation, which translates to faster performance.

More information: Garrett B. Goh et al. Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction. arXiv:1712.02734 [stat.ML]. arxiv.org/abs/1712.02734

Provided by Pacific Northwest National Laboratory

Citation: New AI approach bridges the 'slim-data gap' that can stymie deep learning approaches (2019, February 26) retrieved 17 July 2024 from https://techxplore.com/news/2019-02-ai-approach-bridges-slim-data-gap.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Deep learning for electron microscopy

43 shares

Feedback to editors

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

11 hours ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

13 hours ago

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

15 hours ago

Large language models make human-like reasoning mistakes, researchers find

16 hours ago

Unveiling a new class of synthetic fuels

16 hours ago

Microsoft unveils software that allows LLMs to work with spreadsheets

16 hours ago

New technique to assess a general-purpose AI model's reliability before it's deployed

17 hours ago

New system enables intuitive teleoperation of a robotic manipulator in real-time

20 hours ago

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

21 hours ago

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Jul 15, 2024

Load comments (0)

New AI approach bridges the 'slim-data gap' that can stymie deep learning approaches

Cats, dogs, and clean data

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Deep learning for electron microscopy

How learning more about neuroscience might influence development of improved AI systems

Using deep neural network acceleration for image analysis in drug discovery

Could artificial intelligence make life harder for hackers?

Training artificial intelligence with artificial X-rays

How deep learning is bringing automatic cloud detection to new heights

New system enables intuitive teleoperation of a robotic manipulator in real-time

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Phys.org

Medical Xpress

Science X

New AI approach bridges the 'slim-data gap' that can stymie deep learning approaches

Cats, dogs, and clean data

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Related Stories

Deep learning for electron microscopy

How learning more about neuroscience might influence development of improved AI systems

Using deep neural network acceleration for image analysis in drug discovery

Could artificial intelligence make life harder for hackers?

Training artificial intelligence with artificial X-rays

How deep learning is bringing automatic cloud detection to new heights

Recommended for you

New system enables intuitive teleoperation of a robotic manipulator in real-time

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Your Privacy