An open-source machine learning framework to carry out systematic reviews

When scientists carry out research on a given topic, they often start by reviewing previous study findings. Conducting systematic literature reviews or meta-analyses can be very challenging and time consuming, as there are often huge amounts of research focusing on different topics, which may not always be relevant to a researcher's work.

Researchers at Utrecht University have recently developed a machine learning framework that could significantly speed up this process, by automatically browsing through numerous past studies and compiling high quality literature reviews. This framework, called ASReview, could prove particularly useful for conducting research during the COVID-19 pandemic.

"Researchers and experts face a major challenge to stay up-to-date with the latest developments in their field nowadays," Jonathan de Bruin, lead engineer involved in the study, told TechXplore. "Reading all the new literature in their field is a very time-consuming task, especially when you want to do this systematically. Those systematic ways of reading literature, called systematic reviews, often lead to impactful scientific publications because they are exhaustive summaries of current evidence."

Professor Rens van de Schoot, one of the researchers who developed ASReview, has carried out several literature reviews throughout his academic career and he was thus well aware of how time consuming the review process can be. In collaboration with experts in machine learning, engineering and information management at Utrecht University, he set out to develop a tool that would significantly speed up the process of conducting systematic reviews and meta-analyses.

The machine learning framework created by de Bruin, van de Schoot and their colleagues is optimized to find a metaphorical 'needle' or multiple 'needles' in a haystack. As scientists conduct large amounts of research about a variety of different topics, automatically identifying the most relevant studies about a given topic can be highly valuable. To do this, de Bruin, van de Schoot and his colleagues trained their machine learning model using an interactive approach called active learning.

"In classical review processes, a researcher is manually presented with an article and needs to decide whether it is relevant or not, and one generally continues exploring until he/she viewed all relevant articles." de Bruin said. "The challenge for our machine learning framework is to minimize the number of irrelevant articles shown to the researcher, which can save a lot of time in the literature review process."

Most existing machine learning systems are trained to accurately classify individual images, texts or other data (i.e., to place data in different categories based on their features). In contrast, the system created by de Bruin and his colleagues was trained to analyze several documents and determine which ones are relevant to a given research topic and which ones are irrelevant.

"The COVID-19 pandemic required medical guidelines and searches for new treatments to be developed in record time," de Bruin said. "Medical practitioners had to read the literature while non-stop working in the hospitals and had limited time to read literature. For this project, we worked together with the Allen Institute for AI, which published the largest database with academic literature on the coronavirus. "

De Bruin, van de Schoot and his colleagues made their automatic system for conducting systematic reviews publicly available during the first weeks of the COVID-19 pandemic, as they felt that it could significantly speed up research about the SARS-CoV2 virus and aid its understanding. ASReview, a user-friendly version of their system, has since been used by numerous scientists to review past studies about the new coronavirus and inform the development of more effective medical guidelines. In the future, ASReview could be used to conduct many other systematic reviews and meta-analyses, which could ultimately speed up research in a variety of fields.

"The use of interactive machine learning like active learning is ready to skyrocket in the upcoming years," de Bruin said. "It is crucial to ensure that interactive machine learning approaches are fully transparent and explainable. In the forthcoming period, we will show that this is possible to apply interactive machine learning in a responsible way in other applications like legal documents and court verdicts."

More information: An open source machine learning framework for efficient and transparent systematic reviews. Nature Machine Intelligence(2021). DOI: 10.1038/s42256-020-00287-7

Journal information: Nature Machine Intelligence