March 15, 2023

An energy-efficient text-to-audio AI

Generative artificial intelligence (AI) systems will inspire an explosion of creativity in the music industry and beyond, according to the University of Surrey researchers who are inviting the public to test out their new text-to-audio model.

AudioLDM is a new AI-based system from Surrey that allows users to submit a text prompt, which is then used to generate a corresponding audio clip. The system can process prompts and deliver clips using less computational power than current AI systems without compromising sound quality or the users' ability to manipulate clips.

The general public is able to try out AudioLDM by visiting its Hugging Face space. Their code is also open-sourced on GitHub with 1000+ stars.

Such a system could be used by sound designers in a variety of applications, such as film-making, game design, digital art, virtual reality, metaverse, and a digital assistant for the visually impaired.

Haohe Liu, project lead from the University of Surrey, said, "Generative AI has the potential to transform every sector, including music and sound creation."

"With AudioLDM, we show that anyone can create high-quality and unique samples in seconds with very little computing power. While there are some legitimate concerns about the technology, there is no doubt that AI will open doors for many within these creative industries and inspire an explosion of new ideas."

Audio output for "A squirrel whistles while chewing gum." Credit: AudioLDM

Surrey's open-sourced model is built in a semi-supervised way with a method called Contrastive Language-Audio Pretraining (CLAP). Using the CLAP method, AudioLDM can be trained on massive amounts of diverse audio data without text labeling, significantly improving model capacity.

Wenwu Wang, professor in signal processing and machine learning at the University of Surrey, said, "What makes AudioLDM special is not just that it can create sound clips from text prompts, but that it can create new sounds based on the same text without requiring retraining."

"This saves time and resources since it doesn't require additional training. As generative AI becomes part and parcel of our daily lives, it's important that we start thinking about the energy required to power up the computers that run these technologies. AudioLDM is a step in the right direction."

The user community has created a variety of music clips using AudioLDM in different genres.

AudioLDM is a research demonstrator project and relies on the current UK copyright exception exemption for data mining for non-commercial research. The paper is published on the arXiv preprint server.

More information: Haohe Liu et al, AudioLDM: Text-to-Audio Generation with Latent Diffusion Models, arXiv (2023). DOI: 10.48550/arxiv.2301.12503

Journal information: arXiv

Provided by University of Surrey

Citation: An energy-efficient text-to-audio AI (2023, March 15) retrieved 20 April 2024 from https://techxplore.com/news/2023-03-energy-efficient-text-to-audio-ai.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Exploring text-to-audio models to make music from scratch

140 shares

Feedback to editors

Microsoft teases lifelike avatar AI tech but gives no release date

1 hour ago

Researchers develop sodium battery capable of rapid charging in just a few seconds

16 hours ago

Greater access to clean water, thanks to a better membrane

18 hours ago

Silent flight edges closer to take off, according to new research

18 hours ago

A flexible and efficient DC power converter for sustainable-energy microgrids

19 hours ago

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

19 hours ago

To build a better AI helper, start by modeling the irrational behavior of humans

19 hours ago

Versatile fibers offer improved energy storage capacity for wearable devices

20 hours ago

Harnessing solar energy for high-efficiency NH₃ production

21 hours ago

A dexterous four-legged robot that can walk and handle objects simultaneously

22 hours ago

Load comments (0)

An energy-efficient text-to-audio AI

Microsoft teases lifelike avatar AI tech but gives no release date

Researchers develop sodium battery capable of rapid charging in just a few seconds

Greater access to clean water, thanks to a better membrane

Silent flight edges closer to take off, according to new research

A flexible and efficient DC power converter for sustainable-energy microgrids

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Versatile fibers offer improved energy storage capacity for wearable devices

Harnessing solar energy for high-efficiency NH₃ production

A dexterous four-legged robot that can walk and handle objects simultaneously

Exploring text-to-audio models to make music from scratch

OpenAI announces Point-E, a machine learning system that quickly creates 3D images from a text prompt

Google demonstrates MusicLM: A hi-fidelity music generating AI

Text-to-audio generation is here: One of the next big AI disruptions could be in the music industry

How behind-the-scenes sound mixing makes movie magic

Google gives progress report on its Universal Speech Model

Microsoft teases lifelike avatar AI tech but gives no release date

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Team develops a way to teach a computer to type like a human

Using sim-to-real reinforcement learning to train robots to do simple tasks in broad environments

Meta's newest AI model beats some peers. But its amped-up AI agents are confusing Facebook users

Phys.org

Medical Xpress

Science X

An energy-efficient text-to-audio AI

Microsoft teases lifelike avatar AI tech but gives no release date

Researchers develop sodium battery capable of rapid charging in just a few seconds

Greater access to clean water, thanks to a better membrane

Silent flight edges closer to take off, according to new research

A flexible and efficient DC power converter for sustainable-energy microgrids

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Versatile fibers offer improved energy storage capacity for wearable devices

Harnessing solar energy for high-efficiency NH₃ production

A dexterous four-legged robot that can walk and handle objects simultaneously

Related Stories

Exploring text-to-audio models to make music from scratch

OpenAI announces Point-E, a machine learning system that quickly creates 3D images from a text prompt

Google demonstrates MusicLM: A hi-fidelity music generating AI

Text-to-audio generation is here: One of the next big AI disruptions could be in the music industry

How behind-the-scenes sound mixing makes movie magic

Google gives progress report on its Universal Speech Model

Recommended for you

Microsoft teases lifelike avatar AI tech but gives no release date

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Team develops a way to teach a computer to type like a human

Using sim-to-real reinforcement learning to train robots to do simple tasks in broad environments

Meta's newest AI model beats some peers. But its amped-up AI agents are confusing Facebook users

Your Privacy