December 7, 2022

Exploring text-to-audio models to make music from scratch

Text-to-audio models make music from scratch #ASA183 — The algorithm transforms a text prompt into audio. Credit: Zach Evans

Type a few words into a text-to-image model, and you'll end up with a weirdly accurate, completely unique picture. While this tool is fun to play with, it also opens up avenues of creative application and exploration and provides workflow-enhancing tools for visual artists and animators. For musicians, sound designers, and other audio professionals, a text-to-audio model would do the same.

As part of the 183rd Meeting of the Acoustical Society of America, Zach Evans, of Stability AI, presented progress toward this end in his talk, "Musical audio samples generated from joint text embeddings."

"Text-to-image models use deep neural networks to generate original, novel images based on learned semantic correlations with text captions," said Evans. "When trained on a large and varied data set of captioned images, they can be used to create almost any image that can be described, as well as modify images supplied by the user."

A text-to-audio model would be able to do the same, but with music as the end result. Among other applications, it could be used to create sound effects for video games or samples for music production.

But training these deep learning models is more difficult than their image counterparts.

"One of the main difficulties with training a text-to-audio model is finding a large enough data set of text-aligned audio to train on," said Evans. "Outside of speech data, research data sets available for text-aligned audio tend to be much smaller than those available for text-aligned images."

Evans and his team, including Belmont University's Scott Hawley, have shown early success in generating coherent and relevant music and sound from text. They employed data compression methods to generate the audio with reduced training time and improved output quality.

The researchers plan to expand to larger data sets and release their model as an open-source option for other researchers, developers, and audio professionals to use and improve.

More information: Conference: acousticalsociety.org/asa-meetings/

Provided by Acoustical Society of America

Citation: Exploring text-to-audio models to make music from scratch (2022, December 7) retrieved 24 April 2024 from https://techxplore.com/news/2022-12-exploring-text-to-audio-music.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

T2CI GAN: A deep learning model that generates compressed images from text

20 shares

Feedback to editors

New insights lead to better next-gen solar cells

21 minutes ago

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

32 minutes ago

Going with the flow: Research dives into electrodes on energy storage batteries

32 minutes ago

Ultra-thin, flexible solar cells demonstrate their promise in a commercial quadcopter drone

59 minutes ago

Microsoft claims that small, localized language models can be powerful as well

1 hour ago

Securing competitiveness of energy-intensive industries through relocation: The pulling power of renewables

2 hours ago

New research demonstrates potential of thin-film electronics for flexible chip design

2 hours ago

A simple 'twist' improves the engine of clean fuel generation

2 hours ago

Storing and utilizing energy with innovative sulfur-based cathodes

2 hours ago

Salt battery harvests osmotic energy where the river meets the sea

5 hours ago

Load comments (0)

Exploring text-to-audio models to make music from scratch

New insights lead to better next-gen solar cells

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

Going with the flow: Research dives into electrodes on energy storage batteries

Ultra-thin, flexible solar cells demonstrate their promise in a commercial quadcopter drone

Microsoft claims that small, localized language models can be powerful as well

Securing competitiveness of energy-intensive industries through relocation: The pulling power of renewables

New research demonstrates potential of thin-film electronics for flexible chip design

A simple 'twist' improves the engine of clean fuel generation

Storing and utilizing energy with innovative sulfur-based cathodes

Salt battery harvests osmotic energy where the river meets the sea

T2CI GAN: A deep learning model that generates compressed images from text

A model that can recognize speech in different languages from a speaker's lip movements

A model to generate artistic images based on text descriptions

A novel multi-modal image retrieval system

The feces thesis: Using machine learning to detect diarrhea

Could synthetic X-rays solve a gap in medical imaging data?

Microsoft claims that small, localized language models can be powerful as well

Emulating neurodegeneration and aging in artificial intelligence systems

The world's largest 3D printer is at a university in Maine. It just unveiled an even bigger one

Personalization has the potential to democratize who decides how LLMs behave

With a game show as his guide, researcher uses AI to predict deception

A new framework to generate human motions from language prompts

Phys.org

Medical Xpress

Science X

Exploring text-to-audio models to make music from scratch

New insights lead to better next-gen solar cells

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

Going with the flow: Research dives into electrodes on energy storage batteries

Ultra-thin, flexible solar cells demonstrate their promise in a commercial quadcopter drone

Microsoft claims that small, localized language models can be powerful as well

Securing competitiveness of energy-intensive industries through relocation: The pulling power of renewables

New research demonstrates potential of thin-film electronics for flexible chip design

A simple 'twist' improves the engine of clean fuel generation

Storing and utilizing energy with innovative sulfur-based cathodes

Salt battery harvests osmotic energy where the river meets the sea

Related Stories

T2CI GAN: A deep learning model that generates compressed images from text

A model that can recognize speech in different languages from a speaker's lip movements

A model to generate artistic images based on text descriptions

A novel multi-modal image retrieval system

The feces thesis: Using machine learning to detect diarrhea

Could synthetic X-rays solve a gap in medical imaging data?

Recommended for you

Microsoft claims that small, localized language models can be powerful as well

Emulating neurodegeneration and aging in artificial intelligence systems

The world's largest 3D printer is at a university in Maine. It just unveiled an even bigger one

Personalization has the potential to democratize who decides how LLMs behave

With a game show as his guide, researcher uses AI to predict deception

A new framework to generate human motions from language prompts

Your Privacy