A machine-learning approach to finding treatment options for COVID-19
When the COVID-19 pandemic struck in early 2020, doctors and researchers rushed to find effective treatments. There was little time to spare. "Making new drugs takes forever," says Caroline Uhler, a computational biologist in MIT's Department of Electrical Engineering and Computer Science and the Institute for Data, Systems and Society, and an associate member of the Broad Institute of MIT and Harvard. "Really, the only expedient option is to repurpose existing drugs."
Uhler's team has now developed a machine learning-based approach to identify drugs already on the market that could potentially be repurposed to fight COVID-19, particularly in the elderly. The system accounts for changes in gene expression in lung cells caused by both the disease and aging. That combination could allow medical experts to more quickly seek drugs for clinical testing in elderly patients, who tend to experience more severe symptoms. The researchers pinpointed the protein RIPK1 as a promising target for COVID-19 drugs, and they identified three approved drugs that act on the expression of RIPK1.
The research appears today in the journal Nature Communications. Co-authors include MIT Ph.D. students Anastasiya Belyaeva, Adityanarayanan Radhakrishnan, Chandler Squires, and Karren Dai Yang, as well as Ph.D. student Louis Cammarata of Harvard University and long-term collaborator G.V. Shivashankar of ETH Zurich in Switzerland.
Early in the pandemic, it grew clear that COVID-19 harmed older patients more than younger ones, on average. Uhler's team wondered why. "The prevalent hypothesis is the aging immune system," she says. But Uhler and Shivashankar suggested an additional factor: "One of the main changes in the lung that happens through aging is that it becomes stiffer."
The stiffening lung tissue shows different patterns of gene expression than in younger people, even in response to the same signal. "Earlier work by the Shivashankar lab showed that if you stimulate cells on a stiffer substrate with a cytokine, similar to what the virus does, they actually turn on different genes," says Uhler. "So, that motivated this hypothesis. We need to look at aging together with SARS-CoV-2—what are the genes at the intersection of these two pathways?" To select approved drugs that might act on these pathways, the team turned to big data and artificial intelligence.
The researchers zeroed in on the most promising drug repurposing candidates in three broad steps. First, they generated a large list of possible drugs using a machine-learning technique called an autoencoder. Next, they mapped the network of genes and proteins involved in both aging and SARS-CoV-2 infection. Finally, they used statistical algorithms to understand causality in that network, allowing them to pinpoint "upstream" genes that caused cascading effects throughout the network. In principle, drugs targeting those upstream genes and proteins should be promising candidates for clinical trials.
To generate an initial list of potential drugs, the team's autoencoder relied on two key datasets of gene expression patterns. One dataset showed how expression in various cell types responded to a range of drugs already on the market, and the other showed how expression responded to infection with SARS-CoV-2. The autoencoder scoured the datasets to highlight drugs whose impacts on gene expression appeared to counteract the effects of SARS-CoV-2. "This application of autoencoders was challenging and required foundational insights into the working of these neural networks, which we developed in a paper recently published in PNAS," notes Radhakrishnan.
Next, the researchers narrowed the list of potential drugs by homing in on key genetic pathways. They mapped the interactions of proteins involved in the aging and SARS-CoV-2 infection pathways. Then they identified areas of overlap among the two maps. That effort pinpointed the precise gene expression network that a drug would need to target to combat COVID-19 in elderly patients.
"At this point, we had an undirected network," says Belyaeva, meaning the researchers had yet to identify which genes and proteins were "upstream" (i.e. they have cascading effects on the expression of other genes) and which were "downstream" (i.e. their expression is altered by prior changes in the network). An ideal drug candidate would target the genes at the upstream end of the network to minimize the impacts of infection.
"We want to identify a drug that has an effect on all of these differentially expressed genes downstream," says Belyaeva. So the team used algorithms that infer causality in interacting systems to turn their undirected network into a causal network. The final causal network identified RIPK1 as a target gene/protein for potential COVID-19 drugs, since it has numerous downstream effects. The researchers identified a list of the approved drugs that act on RIPK1 and may have potential to treat COVID-19. Previously these drugs have been approved for the use in cancer. Other drugs that were also identified, including ribavirin and quinapril, are already in clinical trials for COVID-19.
Uhler plans to share the team's findings with pharmaceutical companies. She emphasizes that before any of the drugs they identified can be approved for repurposed use in elderly COVID-19 patients, clinical testing is needed to determine efficacy. While this particular study focused on COVID-19, the researchers say their framework is extendable. "I'm really excited that this platform can be more generally applied to other infections or diseases," says Belyaeva. Radhakrishnan emphasizes the importance of gathering information on how various diseases impact gene expression. "The more data we have in this space, the better this could work," he says.