March 4, 2022

Machine learning gets smarter to speed up drug discovery

by Lisa Kulick, Carnegie Mellon University Mechanical Engineering

Predicting molecular properties quickly and accurately is important to advancing scientific discovery and application in areas ranging from materials science to pharmaceuticals. Because experiments and simulations to explore potential options are time-consuming and costly, scientists have investigated using machine learning (ML) methods to aid in computational chemistry research. But, most ML models can only make use of known, or labeled, data. This makes it nearly impossible to predict with accuracy the properties of novel compounds.

In an industry like drug discovery, there are millions of molecules from which to select for use in a potential drug candidate. A prediction error as small as 1% can lead to the misidentification of more than ten thousand molecules. Improving the accuracy of ML models with limited data will play a vital role in developing new treatments for disease.

While the amount of labeled molecule data is limited, there is a rapidly growing amount of feasible, but unlabeled, data. Researchers at Carnegie Mellon University's College of Engineering pondered if they could use this large volume of unlabeled molecules to build ML models that could perform better on property predictions than other models.

Their work culminated in the development of a self-supervised learning framework named MolCLR, short for Molecular Contrastive Learning of Representations with Graph Neural Networks (GNNs). The findings were published in the journal Nature Machine Intelligence.

"MolCLR significantly boosts the performance of ML models by leveraging approximately 10 million unlabeled molecule data," said Amir Barati Farimani, assistant professor of mechanical engineering.

For a simple explanation of labeled vs. unlabeled data, Ph.D. student Yuyang Wang suggested thinking of two sets of images of dogs and cats. In one set, each animal is labeled with the name of its species. In the other set, no labels accompany the images. To a human, the difference between the two types of animals might be obvious. But to a machine learning model, the difference isn't clear. The unlabeled data is therefore not reliably useful. Applying this analogy to the millions of unlabeled molecules that could take humans decades to manually identify, the critical need for smarter machine learning tools becomes obvious.

The research team sought to teach its MolCLR framework how to use unlabeled data by contrasting positive and negative pairs of augmented molecule graph representations. Graphs transformed from the same molecule are considered a positive pair, while those from different molecules are negative pairs. By this means, representations of similar molecules stay close to each other, while distinct ones are pushed far apart.

The researchers had applied three graph augmentations to remove small amounts of information from the unknown molecules: atom masking, bond deletion, and subgraph removal. In atom masking, a piece of information about a molecule is eliminated. In bond deletion, a chemical bond between atoms is erased. A combination of both augmentations results in subgraph removal. Through these three types of changes, the MolCLR was forced to learn intrinsic information and make correlations.

When the team applied MolCLR to ClinTox, a database used to predict drug toxicity, MolCLR significantly outperformed other ML baseline models. On another database, Tox21, MolCLR stood out from the other ML models with the potential to distinguish which environmental chemicals posed the most severe threats to human health.

"We've demonstrated that MolCLR bears promise for efficient molecule design," said Barati Farimani. "It can be applied to a wide variety of applications, including drug discovery, energy storage, and environmental protection."

More information: Yuyang Wang et al, Molecular contrastive learning of representations via graph neural networks, Nature Machine Intelligence (2022). DOI: 10.1038/s42256-022-00447-x

Journal information: Nature Machine Intelligence

Provided by Carnegie Mellon University Mechanical Engineering

Citation: Machine learning gets smarter to speed up drug discovery (2022, March 4) retrieved 17 July 2024 from https://techxplore.com/news/2022-03-machine-smarter-drug-discovery.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Engineers use graph networks to accurately predict properties of molecules and crystals

383 shares

Feedback to editors

Engineers develop technique to pinpoint nanoscale 'hot spots' in electronics to improve their longevity

9 hours ago

Researchers create insect-inspired autonomous navigation strategy for tiny, lightweight robots

9 hours ago

Soft, stretchy 'jelly batteries' inspired by electric eels

9 hours ago

Astronomy methods applied to reflections in eyes could help with spotting deepfakes

10 hours ago

The magnet trick: New invention makes vibrations disappear

11 hours ago

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

12 hours ago

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

12 hours ago

Scientists bridge the 'valley of death' in carbon capture technologies

12 hours ago

Flexible electronics researchers develop a completely stretchy lithium-ion battery

15 hours ago

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

17 hours ago

Load comments (0)

Machine learning gets smarter to speed up drug discovery

Engineers develop technique to pinpoint nanoscale 'hot spots' in electronics to improve their longevity

Researchers create insect-inspired autonomous navigation strategy for tiny, lightweight robots

Soft, stretchy 'jelly batteries' inspired by electric eels

Astronomy methods applied to reflections in eyes could help with spotting deepfakes

The magnet trick: New invention makes vibrations disappear

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

Scientists bridge the 'valley of death' in carbon capture technologies

Flexible electronics researchers develop a completely stretchy lithium-ion battery

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Engineers use graph networks to accurately predict properties of molecules and crystals

A deep learning model rapidly predicts the 3D shapes of drug-like molecules

Outsmarting a virus

Developing drugs with the aid of artificial intelligence

'Transformational' approach to machine learning could accelerate search for new disease treatments

Deep machine learning completes information about one million bioactive molecules

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

New system enables intuitive teleoperation of a robotic manipulator in real-time

Microsoft unveils software that allows LLMs to work with spreadsheets

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

Phys.org

Medical Xpress

Science X

Machine learning gets smarter to speed up drug discovery

Engineers develop technique to pinpoint nanoscale 'hot spots' in electronics to improve their longevity

Researchers create insect-inspired autonomous navigation strategy for tiny, lightweight robots

Soft, stretchy 'jelly batteries' inspired by electric eels

Astronomy methods applied to reflections in eyes could help with spotting deepfakes

The magnet trick: New invention makes vibrations disappear

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

Scientists bridge the 'valley of death' in carbon capture technologies

Flexible electronics researchers develop a completely stretchy lithium-ion battery

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Related Stories

Engineers use graph networks to accurately predict properties of molecules and crystals

A deep learning model rapidly predicts the 3D shapes of drug-like molecules

Outsmarting a virus

Developing drugs with the aid of artificial intelligence

'Transformational' approach to machine learning could accelerate search for new disease treatments

Deep machine learning completes information about one million bioactive molecules

Recommended for you

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

New system enables intuitive teleoperation of a robotic manipulator in real-time

Microsoft unveils software that allows LLMs to work with spreadsheets

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

Your Privacy