May 4, 2023

AI training: A backward cat pic is still a cat pic

Genes make up only a small fraction of the human genome. Between them are wide sequences of DNA that direct cells when, where, and how much each gene should be used. These biological instruction manuals are known as regulatory motifs. If that sounds complex, well, it is.

The instructions for gene regulation are written in a complicated code, and scientists have turned to artificial intelligence to crack it. To learn the rules of DNA regulation, they're using deep neural networks (DNNs), which excel at finding patterns in large datasets. DNNs are at the core of popular AI tools like ChatGPT. Thanks to a new tool developed by Cold Spring Harbor Laboratory Assistant Professor Peter Koo, genome-analyzing DNNs can now be trained with far more data than can be obtained through experiments alone.

"With DNNs, the mantra is the more data, the better," Koo says. "We really need these models to see a diversity of genomes so they can learn robust motif signals. But in some situations, the biology itself is the limiting factor, because we can't generate more data than exists inside the cell."

If an AI learns from too few examples, it may misinterpret how a regulatory motif impacts gene function. The problem is that some motifs are uncommon. Very few examples are found in nature.

To overcome this limitation, Koo and his colleagues developed EvoAug—a new method of augmenting the data used to train DNNs. EvoAug was inspired by a dataset hiding in plain sight—evolution. The process begins by generating artificial DNA sequences that nearly match real sequences found in cells. The sequences are tweaked in the same way genetic mutations have naturally altered the genome during evolution.

Next, the models are trained to recognize regulatory motifs using the new sequences, with one key assumption. It's assumed the vast majority of tweaks will not disrupt the sequences' function. Koo compares augmenting the data in this way to training image-recognition software with mirror images of the same cat. The computer learns that a backward cat pic is still a cat pic.

The reality, Koo says, is that some DNA changes do disrupt function. So, EvoAug includes a second training step using only real biological data. This guides the model "back to the biological reality of the dataset," Koo explains.

Koo's team found that models trained with EvoAug perform better than those trained on biological data alone. As a result, scientists could soon get a better read of the regulatory DNA that write the rules of life itself. Ultimately, this could someday provide a whole new understanding of human health.

The research was published in Genome Biology.

More information: Peter Koo et al, EvoAug: improving generalization and interpretability of genomic deep neural networks with evolution-inspired data augmentations, Genome Biology (2023). DOI: 10.1186/s13059-023-02941-w

Journal information: Genome Biology

Provided by Cold Spring Harbor Laboratory

Citation: AI training: A backward cat pic is still a cat pic (2023, May 4) retrieved 6 July 2024 from https://techxplore.com/news/2023-05-ai-cat-pic.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Deep neural networks have become increasingly powerful in everyday real-world applications

90 shares

Feedback to editors

New contaminant-tolerant catalyst could help capture carbon directly from smokestacks

19 hours ago

Is AI a major drain on the world's energy supply?

19 hours ago

Adding audio data when training robots helps them do a better job

20 hours ago

New electrolyte design boosts lithium metal battery range while minimizing fluorine content

20 hours ago

A new brain-inspired artificial dendritic neural circuit

21 hours ago

Student designs wearable purifier to protect underground train users and improve air quality

Jul 4, 2024

Cool roofs outperform green roofs in urban climate modeling study

Jul 4, 2024

Japan deploys humanoid robot for railway maintenance

Jul 4, 2024

Think you're funny? ChatGPT might be funnier

Jul 3, 2024

'Open-washing' generative AI: How Meta, Google and others feign openness

Jul 3, 2024

Load comments (0)

AI training: A backward cat pic is still a cat pic

New contaminant-tolerant catalyst could help capture carbon directly from smokestacks

Is AI a major drain on the world's energy supply?

Adding audio data when training robots helps them do a better job

New electrolyte design boosts lithium metal battery range while minimizing fluorine content

A new brain-inspired artificial dendritic neural circuit

Student designs wearable purifier to protect underground train users and improve air quality

Cool roofs outperform green roofs in urban climate modeling study

Japan deploys humanoid robot for railway maintenance

Think you're funny? ChatGPT might be funnier

'Open-washing' generative AI: How Meta, Google and others feign openness

Deep neural networks have become increasingly powerful in everyday real-world applications

Making AI algorithms show their work

Explainable AI for decoding genome biology

Click away the bias: New system to make AI training easier and more accurate

Deep learning uses stream discharge to estimate watershed subsurface permeability

Optimization could cut the carbon footprint of AI training by up to 75%

A new brain-inspired artificial dendritic neural circuit

Adding audio data when training robots helps them do a better job

Is AI a major drain on the world's energy supply?

Think you're funny? ChatGPT might be funnier

Meta releases four new publicly available AI models for developer use

'Open-washing' generative AI: How Meta, Google and others feign openness

Phys.org

Medical Xpress

Science X

AI training: A backward cat pic is still a cat pic

New contaminant-tolerant catalyst could help capture carbon directly from smokestacks

Is AI a major drain on the world's energy supply?

Adding audio data when training robots helps them do a better job

New electrolyte design boosts lithium metal battery range while minimizing fluorine content

A new brain-inspired artificial dendritic neural circuit

Student designs wearable purifier to protect underground train users and improve air quality

Cool roofs outperform green roofs in urban climate modeling study

Japan deploys humanoid robot for railway maintenance

Think you're funny? ChatGPT might be funnier

'Open-washing' generative AI: How Meta, Google and others feign openness

Related Stories

Deep neural networks have become increasingly powerful in everyday real-world applications

Making AI algorithms show their work

Explainable AI for decoding genome biology

Click away the bias: New system to make AI training easier and more accurate

Deep learning uses stream discharge to estimate watershed subsurface permeability

Optimization could cut the carbon footprint of AI training by up to 75%

Recommended for you

A new brain-inspired artificial dendritic neural circuit

Adding audio data when training robots helps them do a better job

Is AI a major drain on the world's energy supply?

Think you're funny? ChatGPT might be funnier

Meta releases four new publicly available AI models for developer use

'Open-washing' generative AI: How Meta, Google and others feign openness

Your Privacy