share this!
1
5
Share
Email

April 28, 2020

Machine learning tool could provide unexpected scientific insights into COVID-19

by Lawrence Berkeley National Laboratory

A team of materials scientists at Lawrence Berkeley National Laboratory (Berkeley Lab) - scientists who normally spend their time researching things like high-performance materials for thermoelectrics or battery cathodes—have built a text-mining tool in record time to help the global scientific community synthesize the mountain of scientific literature on COVID-19 being generated every day.

The tool, live at covidscholar.org, uses natural language processing techniques to not only quickly scan and search tens of thousands of research papers, but also help draw insights and connections that may otherwise not be apparent. The hope is that the tool could eventually enable "automated science."

"On Google and other search engines people search for what they think is relevant," said Berkeley Lab scientist Gerbrand Ceder, one of the project leads. "Our objective is to do information extraction so that people can find nonobvious information and relationships. That's the whole idea of machine learning and natural language processing that will be applied on these datasets."

COVIDScholar was developed in response to a March 16 call to action from the White House Office of Science and Technology Policy that asked artificial intelligence experts to develop new data and text mining techniques to help find answers to key questions about COVID-19.

The Berkeley Lab team got a prototype of COVIDScholar up and running within about a week. Now a little more than a month later, it has collected over 61,000 research papers—about 8,000 of them specifically about COVID-19 and the rest about related topics, such as other viruses and pandemics in general—and is getting more than 100 unique users every day, all by word of mouth.

And there are more papers added all the time—200 new journal articles are being published every day on the coronavirus. "Within 15 minutes of the paper appearing online, it will be on our website," said Amalie Trewartha, a postdoctoral fellow who is one of the lead developers.

This week the team released an upgraded version ready for public use—the new version gives researchers the ability to search for "related papers" and sort articles using machine-learning-based relevance tuning.

The volume of research in any scientific field, but especially this one, is daunting. "There's no doubt we can't keep up with the literature, as scientists," said Berkeley Lab scientist Kristin Persson, who is co-leading the project. "We need help to find the relevant papers quickly and to build correlations between papers that may not, on the surface, look like they're talking about the same thing."

The team has built automated scripts to grab new papers, including preprint papers, clean them up, and make them searchable. At the most basic level, COVIDScholar acts as a simple search engine, albeit a highly specialized one.

"Google Scholar has millions of papers you can search through," said John Dagdelen, a UC Berkeley graduate student and Berkeley Lab researcher who is one of the lead developers. "However, when you search for 'spleen' or 'spleen damage' - and there's research coming out now that the spleen may be attacked by the virus—you'll get 100,000 papers on spleens, but they're not really relevant to what you need for COVID-19. We have the largest single-topic literature collection on COVID-19."

In addition to returning basic search results, COVIDScholar will also recommend similar abstracts and automatically sort papers in subcategories, such as testing or transmission dynamics, allowing users to do specialized searches.

Now, after having spent the first few weeks setting up the infrastructure to collect, clean, and collate the data, the team is tackling the next phase. "We're ready to make big progress in terms of the natural language processing for 'automated science,'" Dagdelen said.

For example, they can train their algorithms to look for unnoticed connections between concepts. "You can use the generated representations for concepts from the machine learning models to find similarities between things that don't actually occur together in the literature, so you can find things that should be connected but haven't been yet," Dagdelen said.

Another aspect is working with researchers in Berkeley Lab's Environmental Genomics and Systems Biology Division and UC Berkeley's Innovative Genomics Institute to improve COVIDScholar's algorithms. "We're linking up the unsupervised machine learning that we're doing with what they've been working on, organizing all the information around the genetic links between diseases and human phenotypes, and the possible ways we can discover new connections within our own data," Dagdelen said.

The entire tool runs on the supercomputers of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science user facility located at Berkeley Lab. That synergy across disciplines—from biosciences to computing to materials science—is what made this project possible. The online search engine and portal are powered by the Spin cloud platform at NERSC; lessons learned from the successful operations of the Materials Project, serving millions of data records per day to users, informed development of COVIDScholar.

"It couldn't have happened somewhere else," said Trewartha. "We're making progress much faster than would've been possible elsewhere. It's the story of Berkeley Lab really. Working with our colleagues at NERSC, in Biosciences [Area of Berkeley Lab], at UC Berkeley, we're able to iterate on our ideas quickly."

Also key is that the group has built essentially the same tool for materials science, called MatScholar, a project supported by the Toyota Research Institute and Shell. "The main reason this could all be done so fast is this team had three years of experience doing natural language processing for materials science," Ceder said.

They published a study in Nature last year in which they showed that an algorithm with no training in materials science could uncover new scientific knowledge. The algorithm scanned the abstracts of 3.3 million published materials science papers and then analyzed relationships between words; it was able to predict discoveries of new thermoelectric materials years in advance and suggest as-yet unknown materials as candidates for thermoelectric materials.

Beyond aiding in the effort to combat COVID-19, the team believes they will also be able to learn a lot about text mining. "This is a test case of whether an algorithm can be better and faster at information assimilation than just all of us reading a bunch of papers," Ceder said.

Journal information: Nature

Provided by Lawrence Berkeley National Laboratory

Citation: Machine learning tool could provide unexpected scientific insights into COVID-19 (2020, April 28) retrieved 29 June 2024 from https://techxplore.com/news/2020-04-machine-tool-unexpected-scientific-insights.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

With little training, machine-learning algorithms can uncover hidden scientific knowledge

6 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Jun 28, 2024

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

Machine learning tool could provide unexpected scientific insights into COVID-19

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

With little training, machine-learning algorithms can uncover hidden scientific knowledge

Scientists develop smart search app for COVID-19 literature

Collaborative development of a computational tool for coronavirus research

How artificial intelligence is helping scientists find a coronavirus treatment

Artificial intelligence recruited to find clues about COVID-19

Seattle AI lab's free search engine aims to accelerate scientific breakthroughs

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Phys.org

Medical Xpress

Science X

Machine learning tool could provide unexpected scientific insights into COVID-19

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

With little training, machine-learning algorithms can uncover hidden scientific knowledge

Scientists develop smart search app for COVID-19 literature

Collaborative development of a computational tool for coronavirus research

How artificial intelligence is helping scientists find a coronavirus treatment

Artificial intelligence recruited to find clues about COVID-19

Seattle AI lab's free search engine aims to accelerate scientific breakthroughs

Recommended for you

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Your Privacy